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Abstract —A repeated game is an effective tool to model 
interactions and conflicts for players aiming to achieve their 
objectives in a long-term basis. Contrary to static noncooperative 
games that model an interaction among players in only one 
period, in repeated games, interactions of players repeat for 
multiple periods; and thus the players become aware of other 
players’ past behaviors and their future benefits, and will adapt 
their behavior accordingly. In wireless networks, conflicts among 
wireless nodes can lead to selfish behaviors, resulting in poor 
network performances and detrimental individual payoffs. In this 
paper, we survey the applications of repeated games in different 
wireless networks. The main goal is to demonstrate the use of 
repeated games to encourage wireless nodes to cooperate, thereby 
improving network performances and avoiding network disrup¬ 
tion due to selfish behaviors. Furthermore, various problems 
in wireless networks and variations of repeated game models 
together with the corresponding solutions are discussed in this 
survey. Finally, we outline some open issues and future research 
directions. 

Keywords- Repeated games, wireless networks, game the¬ 
ory, Folk theorem, subgame perfect equilibirum. 

I. Introduction 

Game theory is a branch of applied mathematics, which is 
used to study interactions among “intelligent” and “rational” 
participants (i.e., players) in a multi-agent decision making 
process (i.e., a game). In a game, the players want and are 
able to choose optimal strategies to maximize their benefits or 
payoffs. Game theory is widely used in many areas such as 
economics, computer science, military, evolution biology, and 
so on. In general, game theory can be divided into two main 
types, i.e., static and dynamic games. Static games model an 
interaction among players when they take actions only once in 
a single period. By contrast, dynamic games are applied when 
players take actions over multiple periods. In particular, the 
dynamic game is played repeatedly. Therefore, in the dynamic 
games, the players can observe the behaviors of the other 
players in the past and are able to adjust their strategies to 
achieve their goal. In this paper, we focus on one of the most 
important types of dynamic games, namely repeated games 
and their applications in wireless networks. 

Although there are some existing materials and resources 
presenting the applications of game theory in wireless net¬ 
works such as m, m, there exists no survey specifically for 
the repeated game models developed for wireless networks. 
This motivates us to deliver the survey with the objective 
to provide the necessary and fundamental information about 
repeated game models in wireless networks. Hence, through 


this article, the readers will understand how repeated games 
can be used to address different issues in wireless networks. 
Moreover, the rapid development of wireless networks brings 
many benefits for human beings; however, it also brings 
many challenges for researchers. The future wireless networks 
often have unique characteristics that differ from conventional 
wireless networks. For example, in mobile cloud comput¬ 
ing CO , we need to take not only the impact of wireless 
transmissions, but also cloud computing into account. As a 
result, in order to apply repeated games for problems in the 
next generation wireless networks, we have to consider the 
special characteristics of each network and then find models 
and appropriate solutions for using repeated games. This paper 
will be the first step to encourage researchers to explore 
applications of repeated games for future wireless networks. 

In addition, there are many advantages of using repeated 
games in wireless networks which will create favorable con¬ 
ditions for researches in this area. 

1) Interactions among nodes and users in wireless networks 
often happen repeatedly over multiple time periods. 
Thus, the nodes are able to observe actions of their 
opponents in the past. Repeated games allow players 
to adjust their actions and adopt a certain strategy in 
response to other players’ behaviors to optimize their 
long-term benefits which cannot be done through using 
a static game. 

2) In wireless networks in particular, the selfishness nature 
of players who aim to achieve their own objectives 
is common. Consequently, they will act responsively 
only to their interest without concerning about social 
welfare or network-wide performance. This can cause 
deleterious effects to all players involved. To encour¬ 
age cooperation, we can impose rules and mechanisms 
(e.g., punishment) to self-enforce the players. Such rules 
and mechanisms can be modeled in repeated games in 
which the players are aware of potential benefits from 
cooperation through long-time interactions. 

3) In wireless networks, communication is not perfect due 
to channel variations such as noise, fading, and signal 
attenuation. Additionally, network nodes are limited by 
hardware and energy supply. Repeated games offer a 
complete framework for handling noisy, incomplete and 
imperfect information about the network. Additionally, 
repeated games can support distributed decision making 
by using local information, thereby avoiding communi- 


2 



Fig. L Taxonomy of applications of repeated games in wireless networks. 


cation overhead and minimizing energy consumption. 

4) Repeated games support diverse equilibrium solution 
concepts which are suitable for different requirements 
of wireless networks. System designers have choices to 
implement rules and mechanisms to achieve a desirable 
outcome of the games developed to resolve problems 
in existing and emerging wireless networks. In addition, 
any feasible solution can be theoretically maintained by 
the repeated game, which is proved by Folk’s theorem. 

5) With the vigorous development of mathematical tools 
for solving complex problems of repeated games, we 
can apply many variations of repeated game models to 
address specific and characteristics issues of wireless 
communication environment. For example, to deal with 
problems with noise or imperfect information which 
are common in wireless environment, we can apply the 
model of repeated game under the noise or imperfect 
information as shown in the next sections. 


In Fig. [T] we present the taxonomy of repeated game 
applications for wireless networks which is organized based on 
network models. Basically, we consider three major wireless 
networks, i.e., structured networks (i.e., cellular and wireless 
local area networks), unstructured networks (i.e., wireless ad 
hoc networks) and cognitive radio networks. Additionally, 
there are other wireless networks with unique characteristics 
which will be also introduced in this paper. The use of this 
taxonomy stems from the fact that different networks have 
distinct properties that lead to different problems and issues 
to be solved. 

The rest of this paper is organized as follows. Section [TI| 
introduces the background and basic of repeated games. In 
Sections 00 the reviews of different existing work are 
given. We highlight trends and summarize current research 
in Section |VII| Moreover, we outline some open issues and 
present some research directions in Section |VII| F inally, we 
summarize and conclude the paper in Section~|Vin[ 


II. Fundamental of Repeated game 

The repeated game theory provides a formal framework 
to model a multi-player sequential decision making process 
and to explore the potential cooperation in the long run. In 
this section, we provide some preliminary of a repeated game 
and its variants. We then provide some comparison with other 
optimization and game theoretic approaches. 

A. Definitions and Fundamental Concepts 

The basic component of a repeated game is called the 
stage game G, which is a finite 7V-player simultaneous-move 
game in a strategic form, also called a normal form. The 
stage game can be represented by < AT, (A*),(i^) > with 
a finite action space A if and payoff function U{ for player 
i £ M = {1, 2, ..., N}, where N is the total number of 
players. In a repeated game, denoted by G T , the players 
play the same stage game for T rounds or periods (possibly 
T -A oo). At the end of each round, the same structure 
repeats itself. In other words, the game in the current stage 
does not change the structure of the next stage. Usually, 
repeated games assume the availability of a common entity 
that produces a publicly observed signal, uniformly distributed 
and independent across period^] so that the perfect public 
information is available at all players. 

Repeated games can be broadly classified into two types, 
depending on whether the time horizon, i.e., the number of 
periods, is finite or infinite. The infinite time horizon game is 
suitable for the situations where players always presume the 
game to be played one more round with a high probability 
(i.e., without a known termination point). The finite time 
horizon game describes the situation where the information 
about termination of playing game is commonly known. 

Let a 1 = {a\, a \,..., a^} denote the actions used by all 
players in round t. Suppose that the game begins in round 

1 In this paper, we use round, period and game stage interchangeably. 
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t = 1 , for t > 2, let h l = {a 1 , a 2 ,. .., a t_1 } denote the 
actions used at all rounds before t , and let H l = (A)* be 
the space of all possible round-f histories. In each period, the 
players can observe all history in each previous period. Let A i 
be the space of probability distribution over A^ The infinitely 
repeated game is formally defined as following. 

Definition 1. Let A°° represent the set of infinite sequences 
of action profiles. An infinitely repeated game of a stage 
game G is an extensive game with simultaneous form moves 
based on perfect information < >, where 

H = {0} U ( |J t =i A t ) U A°°, P is a profile that maps every 
non-terminal history h <E H to each player, P* is a preference 
relation on A°° that satisfies the following notion of week 
separability: if (a t ) G A°°, a G A, a' G A, and ufia) > 
Ui (a'), then for all t, we have (a 1 ,..., a t_1 , a, a t+1 ,...) P* 
(a 1 ,..., a* -1 , a', a t+1 ,...). 

Note that in a repeated game, a strategy for player i assigns 
an action Ai for every finite sequence of outcomes in G. 

Similarly, finitely repeated games with a fixed known time 
horizon T is formally defined as following. 

Definition 2. A T-round finitely repeated game of G is an 
extensive form game of perfect information satisfying all 
the conditions of Definition [7] with oc replaced by T. The 
preferences can be represented by the mean payoff expressed 
as follows: Ylt=i Ui{a l )/T. 

The overall payoff of a player i G Af is a weighted average 
of the payoffs in each period, which is given by 

T 

Uj = (i~ Si) fii u i( at )• (!) 

t =i 

The factor (1 — Si) normalizes an overall payoff so that the 
payoffs in the repeated game are on the same scale as in the 
stage game. The discount factor Si can be interpreted in two 
ways. Firstly, it denotes how much a future payoff is valued 
at the current period. Secondly, a player is patient about the 
future, but the game ends at any round with probability 1-Si. 

Definition 3. In a repeated game, since all players observe 
Inf, a pure strategy Si for player i is a sequence of maps 
s\. This maps possible round-t history Inf G H 1 to actions 
ai G A^ A mixed strategy ai is a sequence of maps a\ from 
H l to mixed actions a* G Aj. 

A repeated game is suitable for modeling long-term inter¬ 
actions and relationship among multiple players, and it can 
achieve a good equilibrium outcome. Unlike a single stage 
game, repeated interaction among players is induced in a 
repeated game which may lead to the cooperation instead of 
defection. This is due to the threat of potential punishment 
by other players in future rounds of the game. In particular, 
in the design of a repeated game, a strategy can be chosen 
to compensate the player for his instantaneous payoff loss 
due to cooperation, by rewarding the player later using an 
additional payoff that will be more than the current loss. Thus, 
punishment rules are required to avoid the player deviating 
from a socially optimum point. 


B. Equilibrium Concepts and Strategies in Repeated Games 

1) Equilibrium Concepts: In a normal-form stage game, 
each player is concerned only about its own payoff, and will 
choose the strategy accordingly. The most commonly used 
solution concept of the stage game is the Nash equilibrium. 

Definition 4. A strategy profile a is a Nash equilibrium if 

for every player i and every strategy a[, 

Ui(a(c r)) > Ui(a(<7_i,<7-)). (2) 

At the Nash equilibrium, none of the players can improve its 
payoff by a unilateral deviation. The Nash equilibrium strategy 
is the best for a player if the others also use their Nash 
equilibrium strategies. However, the Nash equilibrium has 
some shortcomings. Firstly, there may be no Nash equilibrium 
in a game. Secondly, there may exist multiple Nash equilibria 
in a game. Therefore, some equilibrium may be better than 
other equilibria and the players have to choose one of them. 
Third, compared with the centralized optimization, the Nash 
equilibrium might have poor performance, which is called the 
Price of Anarchy. 

For an extensive-form repeated game, strategies are played 
repeatedly to guarantee the Nash equilibrium at any subgame 
of the initial state game. The outcome is the subgame perfect 
equilibrium. 

Definition 5. A strategy profile a is a subgame perfect 

equilibrium if it is a Nash equilibrium, and for every history 
h(t), every player i, and every alternative strategy a[, 

Ui(a(cr , h(t))) > Ui(a(<7-i, cr', h(t))). (3) 

For a repeated game with a finite number of stages, back¬ 
ward induction is commonly used to obtain the subgame 
perfect equilibrium. 

Typically, there are more than one equilibrium in the game. 
Therefore, it is important to define and evaluate a preference 
among equilibrium outcomes. A common approach is to 
measure an optimality, which is called Pareto optimality. The 
Pareto optimality is the payoff profile that no strategy can 
make at least one player better off without making any other 
player worse. 

Definition 6. Let E C be a set of possible payoffs. For cr, 
a' GS, if 3 a\> ai and $ a[ < ap then a' Pareto dominates 
cr. Then a G E is Pareto optimal if there exists no <j' G E 
for which a[ > ai for all i G J\f. And cr G E is strongly 
Pareto optimal if there exists no cr' G E for which a[ > ai 
for all i G Af and a[ > ai for some i G Af. 

Pareto optimality serves as the approach to restrict a large 
set of repeated game equilibria when the players are patient. 
The Pareto frontier is referred to as the set of all cr G E 
that are Pareto optimal. For the case with multiple equilibrium 
outcomes, the ones on the Pareto frontier are superior to others. 

2 ) Strategies: The above equilibrium concepts describe the 
properties of an equilibrium outcome, but they do not address 
the problem how to reach the equilibrium outcome of a 
game. Indeed, as the repeated games are the extensive form 
of the stage games, the strategies in the repeated games are 
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TABLE I 

Common Strategies in Repeated Games. 


Strategy 

Description 

Always cooperate 

Cooperates on every move 

Always defect 

Defects on every move 

Random play 

Random (defect/cooperate) on every move 

Grim 

Cooperate until one of others defects then defect forever 

Tit-for-tat (TFT) 

Start by cooperating and repeatedly select the last strategy played by the opponent 

Generous TFT 

Same as TFT, except that it cooperates with a probability q when the opponent defects. 

Tit for two tats 

Cooperates on the first move and defects only when the opponent defects two times 

Two tits for tat 

Same as TFT except that it defects twice when the opponent defects 

Suspicious TFT 

Same as TFT, except that it defects on the first move 

Contrite TFT 

Same as TFT when no noise. However, in a noisy environment, once it receives a wrong signal because of error, it will choose 
cooperation twice in order to recover mutual cooperation. 

Adaptive strategy 

An adaption rate r is used to compute a continuous variable “world” according to the history moves of the opponent. 

Cartel maintenance 

In this strategy, players first compute a cooperation point that has better payoff than Nash equilibrium points. If any player deviates 
from the cooperation while others still playing the cooperative strategy, all other players will defect in next game stages. 

Forgiving strategy 

A player starts with cooperation and he plays cooperation (C) as long as everybody played C in this past. If anybody plays defection 
(D) at any period, then the player plays D for the next k periods. After k periods of punishment he returns to playing C until 
someone deviates again. 

Cheat-proof 

In game theory, an asymmetric game where players have private information is said to be cheat-proof (of truthful mechanism) if 
there is no incentive for any of the players to lie about or hide their private information. 


highly dependent on the best actions in the stage games. We 
therefore introduce some principles in the design of strategies 
for repeated games. 

A simple approach is to check whether a given strategy 
profile of a repeated game constitutes a subgame perfect 
equilibrium. This is one-shot deviation principle. 

Theorem 1. One-Shot Deviation Principle. A strategy profile 
for a finitely repeated game is a subgame perfect equilibrium 
if and only if there is no history such that the player can 
increase its payoff in the subgame following that history by 
choosing a different action at the beginning of the subgame 
while leaving the remainder of its strategy unchanged. 

By applying the one-shot deviation principle, a remarkable 
reduction in complexity of finding equilibrium strategies can 
be achieved. 

For finitely repeated game, the following theorem holds. 

Theorem 2. If the stage game G has a unique Nash equi¬ 
librium a*. Then, for any T, the unique sub game perfect 
equilibrium in the T-horizon repeated game G T is given by the 
strategy profile in which every player after every non-terminal 
history chooses a*. If the state game G has multiple Nash 
equilibria, then any outcome h T = (ai,...,aT), in which 
for every t (1 < t < T) the action profile af is one of the 
Nash equilibria of G, can be obtained as the outcome of the 
subgame perfect equilibrium of G T . 

For infinitely repeated game, the following two theorems 
hold. 

Theorem 3. Nash Folk Theorem: For every feasible payoff 
vector v with Vi > V-i for all players i, there exists a discount 
factor S < 1 such that for all 5 E (5,1) there exists a Nash 
equilibrium of G(5) with payoffs v. 

This theorem indicates that when the players are patient 
enough, any one-stage gain with a finite value is outweighed 
by even a small loss in payoff in every future round. 

Theorem 4. Subgame Perfect Folk Theorem: Let a* be a 


static Nash equilibrium of the stage game with payoffs v*. For 
any feasible payoff u with Ui > v*, for all i E Af, there exists 
some 5 < 1 such that for all S > 5, there exists the subgame 
perfect equilibrium ofG°°(S) with payoffs u. 


The subgame perfect Folk theorem shows that any payoff 
above the static Nash equilibrium payoffs can be the payoff 
of the subgame perfect equilibrium of the repeated game. 

Typically, repeated games employ trigger strategies to incen- 
tivize cooperation and punish any defecting player if a certain 
level of defection (i.e., the trigger) is observed. Let af denote 
the defection payoff of player i, given that other players play 
the equilibrium strategy, a k _ i . Let af represent the punishment 
payoff of player i. Generally, we have af > a* > af. To 
design the trigger strategy for a repeated game, the following 
condition should hold, i.e., 


1 


-6 


>a? + 


5af 

1 - 6 ' 


(4) 


which gives, 


6 > 


„ D 





(5) 


Note that the level of punishment and the sensitivity of 
the trigger vary with different trigger strategies. In Table [T| 
we have summarized some of the well-known strategies in 
repeated games applied to wireless networks. 


C. Variations and Extensions 

Next, we briefly discuss some variations and extensions of 
a repeated game that are often used in wireless networks. 

1) Repeated Game under Noise: The above introduced 
repeated game can be regarded as the game played in a noise- 
free environment. Namely, all direct or indirect observations 
in the repeated games are assumed to be correct. However, in 
reality, there exist errors in observing other’s behavior by the 
players. A repeated game under noise is an alternative tool 
that can model the situations when the players have wrong 
knowledge about their opponents. In particular, two types of 
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error could occur during the iterations of play (32). The first 
type of error, namely a perception error, accounts for the 
wrong observation of other players. The second type of error 
explains the wrongly taken action instead of the intended one 
due to the interference in the environment, which is called an 
implementation error. For example, a relay node in a multihop 
network may misunderstand the behavior of other nodes in 
the same network, i.e., due to the perception error, if the relay 
node makes the decision to take a cooperation action (i.e., 
to relay packets of other nodes). However, due to temporary 
poor channel condition, the packets may not be successfully 
forwarded. Inadvertently, this is possibly observed as a selfish 
defection instead of a cooperative relay action. This is the 
implementation error. 

Usually, noise appears in the form of random errors, which 
consequently makes the possible outcomes in each stage 
unpredictable. Thus, the players can only make responses 
based on expected stage payoffs. The existence of noise in 
an environment considerably increases the complexity of the 
repeated games. Moreover, cooperation becomes much more 
difficult to maintain ED- Typically, there exist three ap¬ 
proaches to address the noise issues in a repeated game EH. 
The first approach is called “generosity” which lets players 
tolerate some noncooperative behavior and with some certain 
degree the players do not punish a deviating player. The 
second approach is called “contrition”. This approach allows 
contrition in a reciprocating strategy to avoid responding to the 
other player’s defection aroused by its own unintended action. 
The third approach, namely “win-stay, lose-shift”, adopts the 
strategy that repeats the same action if the latest payoff is high, 
but changes the action otherwise. 

2) Repeated Game with Imperfect Public Monitoring: 

In repeated games with imperfect public monitoring, players 
cannot directly observe the other players’ strategies, but can 
observe imperfect and public signals about them. The players’ 
information is a stochastic public signal, the distribution of 
which is dependent on the strategy profiles chosen by the 
players. Similar to perfect public monitoring, there is also a 
recursive structure in the case with imperfect public monitor¬ 
ing. However, there is often no proper subgame, and thus a 
subgame perfect equilibrium may not be effective. To address 
this problem, the concept of perfect public equilibrium (13) 
has been introduced. In this concept, the players can make 
public strategies based on public history , which is a sequence 
of realization of the public signal. The Folk theorem for 
imperfect public information has also been proposed in (4). 

3) Repeated Game with Imperfect Private Monitoring: 
Repeated games with imperfect private monitoring deal with 
the situations that the public information of players are not 
openly available, and each player can only obtain imperfect 
private information about the other players through its own 
direct or indirect monitoring. ,The difficulties associated with 
private monitoring lie in two aspects (5). Firstly, the games 
lack recursive structure which yields the equilibria that do not 
possess a simple characterization. Secondly, at each round, 
players must conduct statistical inference on what others are 
about to do. There are mainly two model settings that bypass 
the above two difficulties. One is called No Discounting 


Model , or 5-Rationality, in which there is no discounting loss, 
i.e., 6=1, or the discounting loss in the average discounted 
payoff is tolerated. Another one is Communication Model (7), 
which introduces communication in the repeated games. At 
each stage, the players are asked to reveal their private signals, 
but they can tell a lie if that is beneficial. By constructing 
equilibria where a player’s report is used to force other play¬ 
ers, truth-telling can be used among players and equilibrium 
strategies can be devised based on the publicly observable 
history of communication. 

4) Repeated Game with Incomplete Information: In a re¬ 
peated game, when some players lack information of the 
others, they are said to be with incomplete information, and 
their games are accordingly called repeated games with incom¬ 
plete information. This type of game is developed to capture 
widely existent situations in which a variety of features of the 
environment may not be commonly known by all the involved 
players. Unlike a repeated game with imperfect (public or 
private) monitoring, the player with incomplete information 
might not have common knowledge of the followings: i) 
payoffs of himself and other players; ii) who/what types the 
other players are; iii) what actions are possible for himself and 
other players; iv) how the action affects the outcome; v) what 
are the preferences of the other players; vi) what the other 
players know about what he knows. 

A variant for Folk theorem has also been introduced in 
(36) for a repeated game with incomplete information. It 
is proven that any payoffs that Pareto-optimality dominates 
the Nash equilibrium can be sustained at an equilibrium of 
a finitely repeated game with incomplete information. With 
incomplete information, the Bayesian Nash equilibrium is a 
general solution concept cm In a Bayesian repeated game, 
the players make the best response action based on the beliefs 
about the other players’ strategies. During the repetition of the 
game, the players can iteratively update their beliefs through 
learning [1681. 

D. Comparisons with other tools 

We provide some brief comparison among a repeated game 
and other commonly used optimization or game approaches. 

1) Markov decision process (MDP): An MDP O is an 
approach to make optimal sequential decisions under uncer¬ 
tainty. It is designed for a player making a decision based on a 
state and interacting with a system or environment. An MDP 
model is composed of a state, action, and reward. The policy 
is a mapping of a state to an action. When an action is taken 
at any state, a player receives a reward. An optimal policy is 
to maximize long-term reward, which could be discounted or 
average reward E). However, the MDP is for decision making 
of a single player. An interaction among multiple players over 
multiple time periods cannot be modeled using an MDP. 

2) Stochastic game: Stochastic game (Toll is a generaliza¬ 
tion of an MDP with some applications in wireless networks. 
Strategies in stochastic games are made based on the history 
statistic of the interactions. Specifically, in each round, a player 
makes decisions based on a competitive policy. After the action 
is taken, the current state transits to the next state. By contrast, 
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in a repeated game, there is no need for a state transition. 
Additionally, a player’s strategy is not necessarily dependent 
on the past values of its opponent’s randomizing probabilities. 
Instead, it can depend only on the past values of its opponent’s 
payoff. This major difference differentiates the applications of 
repeated games and stochastic games. 

3) Coalition formation game: A coalition formation game 
mi is a type of a cooperative game that models an interaction 
of players by forming coalitions to improve their individ¬ 
ual payoffs. The strategies for forming a stable coalitional 
structure among the players can be broadly classified into 
two types: myopic and far-sighted. The former allows the 
players to adapt their strategies given the current state of 
the coalition, while the latter lets the players make their 
strategies by learning, and predicting future strategies of the 
other players. A repeated game is similar to the far-sighted 
coalition formation game in that both games capture long¬ 
term payoffs. The main difference is that a repeated game 
can model different strategies (e.g., punishment), not necessary 
cooperation. 

4) Differential game: A differential game lfl2ll is an exten¬ 
sion of an optimal control framework which aims to find an 
optimal dynamic control strategy for the system with multiple 
agents and single agent, respectively. In differential games, 
the strategy is a continuous function of time. The solution of 
differential games can be the open-loop or the close-loop Nash 
equilibrium. Again, unlike repeated games, the differential 
game lacks a trigger and punishment cannot be implemented. 
Furthermore, differential games need an ordinary differential 
equation and the utility function is typical linear quadratic, 
which limits the scope of applications a lot. 

III. Applications of Repeated Games in Cellular 
and Wireless Local Area Networks 

In this section, we review repeated game models developed 
to solve problems in cellular and wireless local area networks 
(WLANs). The repeated games have been used to address the 
following issues. 

• Multiple Access Control: In cellular and WLANs, wire¬ 
less nodes need to connect to common access points, 
and this leads to a contention in accessing a common 
radio resource. Repeated games are used to control the 
access and also enhance network performance through 
encouraging the nodes to cooperate. 

• Security: Instead of competing with each other which 
can cause the damage to all nodes, wireless nodes can 
choose a cooperation solution to improve their benefits. 
However, this cooperation is based on mutual trust, and 
thus if a player is selfish or malicious, the partnership may 
be severed. Repeated games are used to detect deviations 
and enforce players into the cooperation. 

• Quality-of-service (QoS) Management: Service 
providers need to manage the relation with other 
partners, e.g., customers or other providers to optimize 
their profits while guaranteeing QoS for the customers. 
Repeated games are used by the service providers to 
manage the interaction with other partners to minimize 
the cost and maximize the revenue. 


In the following, we provide the reviews for related work 
based on aforementioned issues. 

A. Multiple Access Control 

In cellular and wireless local area networks (WLANs), wire¬ 
less users communicate with each other or with the Internet 
through access points and base stations. Therefore, one of the 
challenges is the access control for multiple users. This section 
will review the repeated game models for multiple access 
control problems under two access methods: decentralized 
and centralized access. Fig. [2] outlines the corresponding 
approaches. 


Multiple Access Control 



Decentralized 
access method 



Transmission Access probability 
power control control 



Centralized 
access method 



Bandwidth allocation Data rate 
management management 


Fig. 2. Solutions for the multiple access control problems in cellular and 
WLANs. 

1) Decentralized access method: 

a) Transmission power control: MacKenzie lfl4l is one 
of the first pioneers in applying repeated games to power 
control problems in wireless networks. There are multiple 
users sharing the same spectrum allocated by a base station. 
In the repeated game, the players are the users. Each player 
chooses a non-negative power level to transmit signals to the 
base station. The payoff of each player is the number of bits 
successfully transmitted per time slot. The authors introduced 
the trigger strategy in which at the beginning of the game 
the players cooperate by transmitting signals at the desirable 
received power. If anyone uses more energy, which causes 
interference to the other users, in the next time slot all other 
users will increase their power levels to the noncooperative 
Nash equilibrium to punish the deviating player. Such a 
punishment lasts for one shot-game after the deviating user is 
detected. After that, the users cooperate again by using mild 
power levels. 

Similarly, Han et al. aa also considered the power control. 
The players are the nodes, but the actions are transmission 
rates instead of power levels as in IT4ll . The payoff for each 
player is a function of profits obtained for each successful 
transmission minus the cost for link usage. To encourage 
players to cooperate in the repeated game, the authors used 
the Cartel maintenance strategy d. Specifically, at an initial 
time, the players will transmit data at a cooperative transmis¬ 
sion rate predeterminedly agreed. The players then compute 
the successful transmission probability and compare it with a 
threshold. If the computed successful transmission probability 
is lower than the threshold, they choose the noncooperative 
strategy in the next few game stages. Afterward, all the players 
will re-cooperate. To determine the optimal parameters, the 
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authors used the policy gradient method as presented in ESI. 
Under the proposed strategy, it is proven in ifiTS that the 
solution of this game is a perfect public equilibrium d. 

Different from Ha and ful where wireless users can trans¬ 
mit data to the base station simultaneously, Auletta et al. ED 
assumed that there is at most one successful transmission at 
each game stage. The player successfully transmits data if 
and only if its signal strength is higher than the total signal 
from other players and the noise. Since there is at most one 
player successfully transmitting data, the authors applied the 
alternating transmission strategy. At each game stage, only 
one player is allowed to transmit. If the player transmits 
data, but the transmission fails, i.e., someone deviates from 
the cooperation, the punishment will be implemented in the 
next few stages. In the punishment phase, all the players will 
transmit data with the highest power level. 

In the above work urn d, and m, the channel between 
nodes and the base station is assumed to be constant over 
time. However, in some cases, this assumption is inapplicable. 
Therefore, the authors in [ 201 investigated two cases with 
channel variations, namely, fast power control (FPC) and slow 
power control (SPC). In FPC, the channel gain is constant over 
the game stage. In SPC, the channel gain can be varied over 
the game stages. The payoff is a ratio of the transmission 
rate multiplied with the successful transmission rate and the 
transmission power level. It is shown that the static one-shot 
game has a non-saturated Nash equilibrium. The authors also 
proposed a trigger strategy for the nodes and it is proven that 
the repeated game has a subgame perfect equilibrium. The 
proof for the existence of the equilibrium can be found in [21 ]. 
To detect any deviating player, the authors proposed the 
following detecting mechanism based on the public signal. If 
players cooperate, the public signal will be constant. However, 
if someone deviates, the public signal will not be constant, and 
thus players will change to noncooperative strategy. 

For the channels with SPC, since the channel gain can 
vary, the payoff of each player depends not only on the joint 
actions, but also on the channel gains in each game stage. The 
repeated game therefore becomes a stochastic repeated game 
(SRG) that is modeled through the irreducible state transition 
probability. In SRG, the Folk theorem is no longer useful since 
the stage game changes over time, and thus the authors applied 
the extended theory of the Folk theorem which is proven for 
the stochastic games with public information f22l . With the 
extended Folk theorem, it is demonstrated that there exists a 
perfect public equilibrium strategy of the stochastic game. 

b) Access probability control: Cagalj et al. [ 23 j studied 
the multiple access problem in CSMA/CA networks. The 
CSMA/CA networks operate based on the assumption that the 
users are honest, and they follow the protocol predefined in 
a standard (e.g., IEEE 802.11 DCF [24]). However, selfish 
users can modify the rules defined by the protocol to receive 
more benefits. Consequently, the honest users will experience 
poor performance and unfairness in using a common radio 
resource. In the repeated game, the players are wireless users, 
their actions are selections of channel access probabilities, 
and the payoff function is achievable throughput. The authors 
developed the distributed learning algorithm for the users to 


adjust their behaviors such that their strategies converge to the 
Pareto optimal point. However, with the proposed algorithm, 
the player may not follow the designed rules and deviate from 
the cooperation. Thus, the authors proposed the detection and 
penalizing mechanism. To detect deviating users, the authors 
assumed that the users are able to measure the throughput 
of all other users in the network. Thus, if any user has its 
throughput that is different from other users, it will be treated 
as a deviating player. Then, the penalizing mechanism is 
executed that the packet transmitted by the deviating user will 
be jammed by the other users in the cooperation. 

While the solution proposed in ca requires the modi¬ 
fication in the MAC layer of wireless nodes, the proposed 
approach in [25] can protect wireless users from selfish users 
without any change to the MAC protocol. With the same 
players, their actions are the selections of backoff configura¬ 
tions. The authors defined three configurations for the nodes, 
i.e., to choose a standard backoff configuration predefined by 
IEEE 802.11, a greedy configuration which will access the 
channel once a collision happens, and a selfish configuration 
that will access channel after a collision happens in two time 
slots. The payoff is the throughput. The payoff at every game 
stage is equally weighted (i.e., there is no discounting for 
future payoffs). Thus, the payoff can be quantified by the 
liminf-type asymptotic [26 ]. The authors then proposed a co¬ 
operation strategy via randomized inclination to selfish/greedy 
play (CRISP) to detect and punish selfish nodes. By using 
CRISP, all the strategies of the nodes are the subgame perfect 
equilibrium. However, the condition for the existence of the 
subgame perfect equilibrium is that the payoff function of 
players must be liminf-type asymptotic f26l . 

2) Access management based on a central node: In the 
following, we will review two schemes for network access 
through using a central node. In the first scheme, the central 
node will decide the amount of bandwidth that each node 
uses. In the second scheme, the central node will indicate the 
transmission rate for each node. 

a) Bandwidth allocation management: The authors 
in E3 examined a non-monetary mechanism introduced 
in m for the bandwidth allocation problem with the aim to 
achieve the Pareto optimal equilibrium. The authors adopted 
the concept of service purchasing power which is interpreted 
as a “constant budget”. The purchasing power is determined 
based on the closed contrast between the customer and the 
service provider. Based on the users’ service purchasing power, 
the base station will calculate the allocated bandwidth for each 
customer given a predefined rule. In the repeated game, the 
players are the nodes and they can select two actions, namely, 
cooperate or defect. The player defects if it uses the maximal 
allocated bandwidth, while the player cooperates if it utilizes 
the allocated bandwidth only if its marginal utility exceeds 
marginal social costs. To enforce the players to cooperate, 
the tit-for-tat (TFT) strategy is applied. To detect and punish 
the defecting user, the base station will monitor the users’ 
behaviors and compute their weighted ratios. If the player 
is cooperative, the base station will improve the purchasing 
power of this player. Consequently, this player may receive 
more bandwidth in the next game stage. By contrast, if a player 


defects, the base station will reduce the purchasing power, and 
thus this player will have less allocated bandwidth. 

Unlike in (27). the authors in |[29ll considered the bandwidth 
allocation based on the traffic demand of users. The mobile 
users first declare their traffic demands to the base station, 
and then the base station will decide the amount of bandwidth 
allocated to the users. However, the users may report fake 
demand, and thus repeated games are used to prevent selfish 
users and enforce them into cooperation. The authors applied 
a cost function in the payoff of the players. Specifically, when 
the users submit traffic demand, the base station will charge a 
fee depending on the demand. The payoff is the profit obtained 
from transmitting packets successfully minus the cost. With 
the proposed truthful mechanism, it is shown that the selfish 
users cannot gain higher payoff than that of honest users, 
and hence the selfish users have no incentive to deviate from 
truthful declaration. 

While the bandwidth allocation problem is solved based 
on the contract as in 127 ) and the traffic demand as in |29] , 
in [3_0] the decision relies on the channel condition reported 
from users. After the users send channel state information to 
the base station, the base station decides the leading player on 
each channel based on the signal-to-noise ratio (SNR) of the 
users. Apparently, weak users with a poor channel condition 
may have no opportunity to transmit/receive data. Therefore, 
repeated games is used to allow the user to cooperate and 
bring opportunities for weak users. In this game, users who 
achieve high profit but do not cooperate will be punished by 
adding cost in the next game stage. Thus, selfish users will 
have no incentive to deviate. 

b) Data transmission rate management: The authors 
in ED studied the packet scheduling problem in a wireless 
mesh network in which Mesh clients communicate with a 
mesh router. The mesh clients report their channel conditions 
to the mesh router. Based on this information, the mesh 
router selects and allows one mesh client to transmit in the 
current time slot. However, mesh clients could be selfish 
and report bogus channel information to the mesh router. 
Consequently, the selfish client can gain benefits while the 
overall network performance will be adversely affected. To 
address this problem, the authors proposed a trigger strategy, 
namely “Striker,” to detect and punish selfish users when 
they report fake information. In this strategy, if the player 
is detected to defect, all other players will report the highest 
channel condition to the mesh router in the next game stage. 
This punishment lasts for a certain number of periods. To 
detect the defector, each mesh client will measure the average 
data rate of other mesh clients and compare with a predefined 
threshold. 

In [33], the authors studied the allocation that allows mul¬ 
tiple users to transmit data to the access point simultaneously. 
The authors built the framework for multiple accesses by 
proposing two new entities, namely, the regulator and the 
system optimizer. The users send their service requirements 
to the regulator and report channel state information to the 
system optimizer. Based on the received information, the 
regulator will compute the pricing parameter, and the system 
optimizer will determine and send the optimal power allocation 


to the users. The selfish user can gain benefits by transmitting 
misinformation about its channel state to the system optimizer. 
Therefore, a trigger strategy is applied at the system optimizer 
to detect and punish the selfish user. For detection, after 
observing the actual rates of the users and comparing them 
with the reported channel, the system optimizer can identify 
the cheater that will be removed from the cooperation. With 
this strategy, the honest users always gain benefits, while the 
selfish user will be punished and receive low overall payoff. 
However, the framework cannot apply when there are multiple 
cheaters. 

In the same context, the authors in (34 ] also studied the data 
transmission rate control problem. However, the base station 
is also treated as a player in the game. Therefore, the players 
are wireless nodes and the base station. The actions of the 
wireless nodes are the transmission power levels and the action 
of the base station is the decoding order. The players make 
their decisions simultaneously at each game stage. The payoff 
of the wireless nodes is the achievable rate, while that of the 
base station is the revenue that the wireless nodes pay per unit 
rate. The authors then designed the strategy to enforce nodes 
to cooperate. At the beginning, the base station announces 
its rate reward vector to the players, and then based on the 
rate reward each player determines the optimal rate control 
that maximizes sum rate achievable by using the optimal 
centralized control policy as presented in [33]. If the player 
is detected to be a deviator, the base station will decode for 
this player first for some periods and other players will do 
water-filling algorithm during these periods. It is proven that 
the achievable transmission rate of the deviating node will be 
reduced, while that of others will be maximized. Based on the 
Fudenberg and Maskin theorem (36lL it is shown that under 
the proposed strategies, when the repeated game is infinitely 
played, all the boundary points of the capacity region are 
achievable and the obtained equilibrium of the repeated game 
is subgame perfect equilibrium. 

B. Security 

In this section, we review the repeated game models to 
address some security issues. We classify them into three types 
based on the interaction of users with their partners as shown 
in Fig. [3] 

1) Interaction between user and user: In mobile commerce 
(M-commerce) markets, transactions are performed online and 
free-control by authorities and thus buyers can cheat sellers by 
agreeing buying/using goods/services offered by sellers, but do 
not pay fees as agreed. Therefore, a repeated game model was 
introduced in E3 to address this problem. The authors first 
defined two terms, i.e., member and M-alliance. The member 
is a trader (seller or buyer) who takes part in one M-alliance to 
perform transactions, and each member belongs to only one 
M-alliance. Each transaction is considered as a stage game, 
and if the game is played only once, the seller and buyer will 
always choose a cheating strategy that leads both of them to 
gaining nothing. However, if the game is played repeatedly, 
there is an incentive for the players to cooperate by choosing 
an honest strategy. In the repeated game, the players are the 
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Fig. 3. Interaction between (a) user and user, (b) user and service provider, and (c) user and eavesdropper. 


seller and buyer, and they can choose one of two actions, 
namely, honest or cheat for each transaction. The payoff is 
the profit that the players gain after each transaction. The 
authors then proposed a trigger strategy to enforce players to 
cooperate. If any cheater is detected (player A), the cheated 
player (player B) will notify to its M-alliance (MB). The 
MB then checks whether the complaint is true or not. If the 
complaint is true, the MB will ask the MA (M-alliance of 
player A) for compensation. After verifying the information, 
the MA will ask player A to compensate for player B. If player 
A does pay the compensation, the game is still in cooperation. 
However, if player A does not compensate, the MA will revoke 
its membership and send a warning message to other M- 
alliances. Now, if the MA compensates for player B on behalf 
of player A, then the MA still keeps its reputation. However, 
if MA does not compensate, its reputation will be reduced that 
will impact to transactions of all members in its alliance. The 
authors then designed the conditions for the punishment such 
that the selfish players do not have any incentive to deviate 
from cooperation. 

2) Interaction between user and provider: While in ED 
the repeated game is used to deal with the cheating problem 
between a user and another user, in ll38lL Antoniou et al. 
developed the repeated game model to handle a cheating 
problem between a mobile user and a network provider. In 
particular, the mobile user sends a request for QoS support 
with a compensate price to the provider. After receiving the 
request, the provider will deliver QoS to the user. However, 
the user can pay less than the price that it offers. Similarly, 
the provider can be honest by proving QoS as promised or 
cheat by providing lower QoS. The players make decisions 
simultaneously, and they know the actions of each other only 
after the game ends. Thus, if the game is played only once, the 
players will choose to cheat. However, if the interaction be¬ 
tween them is repeated over multiple periods, the cooperation 
by making honest actions is the best strategy. Additionally, 
to punish the deviating player, the authors consider many 
punishment strategies, e.g., grim, TFT, and leave-and-retum. 
Based on the analysis, it is found that the best strategy for the 
mobile user is leave-and-retum (i.e., cooperate as long as the 
provider cooperates and defect for one period if the provider 
defects) and the best strategy for the provider is TFT. The 
paper can be extended by considering the optimal pricing and 


resource allocation to support QoS. 

3) Interaction between user and eavesdropper: In (39), the 
authors studied the security problem in multiple-input single¬ 
output (MISO) wireless networks. The main goal is to prove 
that the eavesdroppers’ noncooperation assumption, which is 
often used in the literature, is not always true. The authors con¬ 
sidered the scenario in which there are two eavesdroppers that 
want to know the information about the transmission between 
an authentic transmitter Alice and an authentic receiver Bob. 
The transmitter Alice is assumed to be equipped with MISO 
technology. The authors indicated that the eavesdroppers can 
cooperate if the interaction between them is repeated. The 
players of the game are the eavesdroppers and their actions 
are to choose to cooperate or not cooperate. If they are 
noncooperative, they will choose zero power for relay signal. 
Alternatively, if they cooperate, the signal power will be 
greater than zero. The payoff of the eavesdropper is the mutual 
information between the transmitter Alice and itself. If the 
game is a one-shot game, it is shown that optimal strategies 
for both eavesdroppers are noncooperative. However, if the 
game is played repeatedly for a sufficiently many periods and 
under a set of appropriate conditions for channels and payoff 
functions, it is proven that there is an equilibrium point in 
which the eavesdroppers will cooperate in the repeated game. 

C. Quality-of-Service Management 

Next, we review the work related to the Quality-of-Service 
(QoS) management from the perspective of service providers. 
Specifically, the service providers aim to achieve maximum 
revenues while the QoS for their customers is still guaranteed. 
There are three cases that are summarized in Fig. [4] We 
consider three scenarios corresponding to the interactions 
between a provider and another provider, a mobile user, and 
a relay node. 

1) Operation management of base stations: In [413, the 
authors studied the scenario with two operators as the players 
and each operator has a set of base stations. The operators 
want to maximize the number of users attached to their base 
stations by increasing the transmission range of their base 
stations. However, when the transmission range is increased, 
the interference among them is also rising. Therefore, the 
actions for the players are to choose the transmission range for 
their base stations such that the coverage area is maximized 
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Fig. 4. Interaction between (a) provider and provider, (b) provider and user, and (c) provider and relay node. 


while the interference is minimized. For a one-stage game, 
the players will choose the maximum transmission range 
for their base stations, causing severe interference. However, 
if the game is repeated, there is an incentive for them to 
cooperate by reducing the transmission range to an optimal 
value (i.e., a cooperation point). The authors then applied a 
trigger strategy for the players. At the beginning, the players 
cooperate by setting the transmission range at the cooperation 
point. Then, if any user deviates, the players will set the 
maximum transmission range for their base stations in the 
next few periods until the deviating players cooperate again. 
Under the proposed strategy, it is proven that the operators can 
achieve the Pareto optimal equilibrium. However, the authors 
did not show how to detect the deviating players. 

While in [401 the authors considered the transmission range 
control problem of base stations, in (41], the authors studied 
the transmission power control problem. Similar to [40|, a 
repeated game is also used to model the interaction among 
base stations. However, in ED, the actions of the base stations 
are to select the transmit power levels, and the payoff is 
impacted by a density function of mobile nodes. The authors 
first used the Kalai-Smorodinsky bargaining l42l to find the 
cooperative point of the game. Then, a trigger strategy is used 
to enforce the nodes into cooperation. To detect deviating 
nodes, the suspect detection procedure E3 is developed and 
and it is shown that the players’ strategies converge to an 
optimal point. 

2 ) Infrastructure upgrading: The authors in ED used game 
theory to analyze the interaction between mobile users and 
service providers. In each period (e.g., a monthly billing cy¬ 
cle), the provider needs to make a decision to invest money to 
improve or to keep its infrastructure, while the users will make 
a decision to stay with the same provider or change to another 
provider. In the one-shot game, both of them will choose the 
noncooperative strategy, i.e., the provider will not invest and 
the user will change the provider due to higher cost and better 
performance, respectively. However, if the interaction between 
them lasts for multiple periods, there will be an incentive for 
them to cooperate. By applying the grim trigger strategy, it is 
shown that there exists a subgame perfect equilibrium for the 
users and provider in which the provider will invest money to 
improve the infrastructure and the users will not change the 


provider. Additionally, the authors considered the case with 
imperfect monitoring. Specifically, the authors assumed that 
the users’ actions are public and monitored by the provider. 
However, the provider’s action is private, and thus the users 
do not know the actions of the provider. Thus, this can 
reduce the profit of the provider because in some cases 
though the provider improves the network, mobile users still 
receive the low quality of service due to the other factors 
(e.g., shadowing), and thus they will change the provider. 
The authors then established the condition for the provider to 
continue to cooperate under the patience and the monitoring 
accuracy of the users. However, the practicability has to be 
further evaluated since it may be intractable to measure the 
patience as well as monitoring accuracy of users. 

3) Bandwidth sharing with relay nodes: In [45j, the authors 
considered the interaction between a base station (eNB) and 
a relay node (RN) in a downlink dual-hop LTE network. In 
the repeated game, the players are the base station and RN. 
Their actions are to choose the number of physical resource 
blocks (PRBs) to access. It is clear that assigning more PRBs 
to the node can improve throughput. However, since PRBs 
are limited, the interference between the nodes sharing the 
same PRB can increase. Therefore, the node can choose one 
of two strategies, either to cooperate by choosing PRBs with 
high channel gains and letting its opponent use these PRBs, 
or to not cooperate by utilizing all PRBs assigned. To make 
player cooperate in the repeated game, a trigger strategy is 
used that will punish the deviating node for T periods if this 
node does not cooperate. Furthermore, to reduce the number 
of punishment periods, the authors proposed using a penalty 
factor in the payoff function of the players. This factor reduces 
the additional achieved throughput if the player is punished. 
Simulation results then show that the proposed solution can 
improve the total achievable throughput. 

Summary: In this section, we have identified three main 
issues in cellular and WLAN networks (CWLANs) and re¬ 
viewed applications of repeated games for these networks. We 
summarize the issues along with references in Table [II| From 
the table, we observe that many papers investigate the multiple 
access control problem, while security and quality-of-service 
(QoS) problems are less studied. Additionally, Fig.[5]shows the 
relation among different types of repeated game models, their 







TABLE II 

Applications of repeated games in cellular and WLAN networks (SPE = subgame perfect equilibrium, BS = base station, TFT = 

TIT-FOR-TAT, CRISP = COOPERATION STRATEGY VIA RANDOMIZED INCLINATION TO SELFISH/GREEDY PLAY) 
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Problem 

Article 

Players 

Actions 

Payoff 

Strategy 

Solution 

Multiple Access Control 

m 

wireless nodes 

transmission power levels 

successful transmission data 

Cartel maintenance 

SPE 

iE3 

wireless nodes 

transmission rates 

profits minus cost 

Cartel maintenance 

SPE 

lH7l 

wireless nodes 

transmission rates 

profits minus cost 

Cartel maintenance 

perfect public 

lUU 

wireless nodes 

transmission power levels 

profits minus cost 

forgiving 

SPE 

1 I 20 I ED 

wireless nodes 

transmission power levels 

a function of transmission 
rate and power level 

Cartel maintenance 

perfect public 
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wireless nodes 

channel access probability 

throughput 

adaptive 

Pareto optimal 
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wireless nodes 

backoff configuration 

throughput 

CRISP 

Pareto optimal 
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mobile users 

the use of bandwidth 

throughput 

TFT 

Pareto optimal 
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mobile users 

report traffic demand 

successful data transmission 

cheat-proof 

Pareto optimal 

HD 

wireless nodes 

report channel condition 

a function of data 
transmission rate 
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ED 

mesh clients 

report maximal data rate 

achievable data rate 
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SPE 

lED 

wireless users 

channel information and 
service requirement 

profits minus cost 

cheat-proof 

Pareto optimal 
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wireless nodes 
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nodes: power levels, BS: 
decoding order 

achievable rate for wireless 
nodes and revenue for BS 
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SPE 

Security 

ED 

mobile seller and 
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honest or cheat 

profits 

cheat-proof 

SPE 

iHD 

mobile users and 
network provider 

honest or cheat 

profits 

users: leave-and-return 
provider: TFT 

SPE 

ilD 

eavesdroppers 

cooperate or 
noncooperate 

mutual information 

punishment trigger 

SPE 

QoS 

m 

BSs 

transmission range of BSs 

cover area 

Cartel maintenance 

Pareto optimal 

ISD 

BSs 

transmission power level 

a function of achievable rate 
and density function 

Cartel maintenance 

Pareto optimal 

iBD 

mobile users and 
service provider 

users: stay/leave, 
provider: improve/remain 

QoS for users and profits 
for provider 

Grim 

SPE 

isa 

BS and a relay 
node 

the number of physical 
resource blocks 

throughput 

forgiving 

Pareto optimal 


strategies and solutions. From Fig. [5] it is found that while 
conventional repeated games (i.e., with perfect information and 
monitoring) are mostly used, their variations (e.g., a repeated 
game with imperfect monitoring) are not much adopted. 

IV. Applications of Repeated Games in Wireless 
Ad Hoc Networks 

In this section, we review the repeated game models devel¬ 
oped for wireless ad hoc networks, which are known as infras¬ 
tructureless networks. Such networks are multihop networks, 
sensor networks, cooperative transmission networks, and peer- 
to-peer networks. In wireless ad hoc networks, wireless nodes 
can communicate directly with each other without relying 
on any infrastructure or pre-configuration as illustrated in 
Fig. [6] Thus, there are many advantages and applications of 
wireless ad hoc networks in practice as discussed in [46], 147 1 . 
However, lack of infrastructure poses many issues such as in¬ 
terference, decentralization, limited-range, and insecurity l46l . 
B71 . In this section, we review the applications of repeated 
games in wireless ad hoc networks in the following aspects. 

• Packet forwarding: In multihop networks, wireless 
nodes transmit packets to a distant destination through 
the support from other intermediate nodes. However, the 
nodes may belong to different authorities and forwarding 
process consumes a certain amount of resource of for¬ 
warders or relays. Therefore, repeated games are used to 
motivate the nodes for the cooperation. 

• Cooperative transmission: Cooperative transmission or 
simply called relay networks can help improving perfor¬ 


mance in terms of speed and reliability through amplify- 
and-forward and decode-and-forward techniques. How¬ 
ever, energy is a crucial resource and has to be efficiently 
utilized. Repeated games are also used to encourage the 
cooperation for cooperative transmission networks. 

• Resource sharing in peer-to-peer (P2P) networks: In 
a P2P network, users help each other to acquire the 
desired services, for example, content distribution. Re¬ 
peated games are used in this network to avoid free-riding 
behaviors of users and enforce them into cooperation. 

• Miscellaneous issues: Besides aforementioned issues, 
repeated games are also used to address other issues in 
wireless ad hoc networks such as clustering, spectrum 
accessing, and security. 

A. Packet Forwarding 

Due to the cost of forwarding packets over multihop net¬ 
works (Fig.|7Ja)), we need to design mechanisms to encourage 
nodes to cooperate and enhance network performance. In 
general, there are three available mechanisms used to in- 
centivize nodes for cooperation, i.e., credit-based, reputation- 
based, and game-based methods. For credit-based methods, 
e.g., Nuglet [48] and Sprite [[16], nodes receive the payment 
if they accept to forward packets for others. However, this 
method needs a centralized accounting server to manage pay¬ 
ments. Consequently, it may not be suitable for decentralized 
networks such as ad hoc networks. Therefore, in the following, 
we will discuss two approaches, i.e., reputation-based and 
game theoretic. 
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Fig. 5. Relation among types of repeated games, strategies, and solutions (RG = repeated game, SPE = subgame perfect equilibrium, TFT = Tit-for-Tat) in 
cellular and WLAN networks. 
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Fig. 6. Comparison between infrastructure networks (cellular network and 
WLAN) and wireless ad hoc networks 

1) Reputation-based Schemes: Different from the credit- 
based approach which needs a central authority for payment 
management, the nodes in reputation-based multihop networks 
are able to monitor and observe their neighbors’ behaviors 
autonomously and independently. In the reputation-based net¬ 
works, the node updates reputation values of its neighbors 
based on its observations and information provided from other 
nodes. Then, the node decides to forward or reject packets for 
its neighbors. Accordingly, the nodes have to take into account 
the future effects of their present actions. This implies that 
repeated games can be used to model the interaction among 
nodes in reputation-based multihop networks. 

General reputation-based networks, e.g., SORI |30| and 
Catch (52, work based on the assumption that every node 
has to maintain the reputation of its neighbors, i.e., fraction 
of their forwarded packets. However, when collisions happen, 
a node can be treated as defecting, and thus its reputation 
will be degraded though it is cooperative. To deal with the 
collisions, the authors in [52] modeled the interaction between 
two nodes as a Prisoner’s Dilemma with noise era and used 
the generous TFT strategy to enforce nodes into the cooper¬ 


ation. Additionally, to detect and sustain the cooperation, the 
reputation-based mechanism is proposed. In the mechanism, 
each node evaluates its neighbors’ reputation based on their 
packet dropping probabilities. Under appropriate conditions, it 
is proven that a subgame perfect equilibrium (SPE) is a mutual 
cooperation of the game. 

In [52), to achieve a full cooperation between users, a per¬ 
fect estimation of reputation is needed. Therefore, in [57], the 
authors proposed a reputation strategy, namely distributed and 
adaptive reputation mechanism (DARWIN) that can achieve 
a full cooperation without the perfect estimation. With this 
strategy, each node estimates reputations of its neighbors, and 
then shares this information with others. Based on the repu¬ 
tation information received, each node evaluates whether its 
neighbor is cooperative or noncooperative. Then, by using the 
modification of the TFT strategy, called Contrite TFT 11171 . 
it is proven that DARWIN can also achieve a subgame perfect 
equilibrium. 

Extended from (52), Ji et al. (53) examined a belief-based 
packet forwarding approach which was developed in (54) to 
cope with not only noise, but also imperfect observation in 
reputation-based networks. In this approach, a node needs to 
maintain a belief probability distribution function with other 
nodes to estimate their actions. Then, based on this function, 
the node makes a decision to forward or reject a packet. After 
each game stage, the nodes will update their belief functions 
by using Bayes’ rule (55). Then, their strategies are adjusted 
for the next game stage according to their beliefs. By using the 
proposed approach, it is proven that under some appropriate 
conditions, the proposed strategy is a sequential equilibrium. 
A sequential equilibrium <551 is a well-defined counterpart 
of a subgame perfect equilibrium for repeated games with 
imperfect monitoring. The sequential equilibrium ensures that 
there is no incentive for deviating users. The simulations show 
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Fig. 7. Illustration of (a) Multihop networks (b) Relay networks, and (c) Peer-to-peer networks 


that the proposed strategy achieves not only the sequential 
equilibrium, but also near-optimal performance. 

Similar to [53], in (60l . the authors also studied belief-based 
approaches to obtain sequential equilibria. However, in (6(3, 
the authors proposed a more efficient method to circumvent 
the complexity of updating beliefs as in ED. The idea is 
based on a state machine. In particular, each node has two 
states (corresponding to two actions), called cooperate (C) 
or noncooperate (N). Based on signals observed from the 
environment, each node will make its decision to stay at the 
current state or switch to another state. The simulation results 
indicate that the performance achieved from the proposed 
strategy outperforms other noncooperative strategies. 

In (611 , the authors proposed a solution that even does not 
need to compute belief functions for the nodes as in l l53] 
and (60l . The solution is based on a belief-free equilib¬ 
rium (64], (65] that does not need to determine other nodes’ 
private history, and hence the computation of the optimal strat¬ 
egy is not necessary. Consequently, the strategies considered 
in this paper do not depend on a node’s own action and the 
equilibrium construction is not as complex as in (60l . The 
authors considered a repeated game with random states. The 
game is modeled as a stochastic game instead of a simple 
repeated game. Since in each game stage, the node may or 
may not have packets to send. Thus, stages of the game 
can be random and different over time. Additionally, the 
authors used a monitoring technique (similar to a watchdog 
mechanism (66]) for the nodes to monitor the behavior of 
their neighbors. The signals received from observing actions 
from the opponents are used to adjust the packet forwarding 
probability. If the nodes are observed to be cooperative, the 
packet forwarding probability will be one. Otherwise, the 
probability will be reduced. By using this strategy, it is proven 
that the nodes can achieve the belief-free equilibrium solution. 

The authors in ED also proposed a new method to over¬ 
come the limitations in [53] by using repeated games with 
communication and private monitoring (RGC&PM) (59] . In 
RGC&PM, the nodes are assumed to be able to communicate 
with other nodes to exchange information about the behaviors 
that they can observe. The information can be used to identify 
the deviating node. Then, the distributed learning repeated 
game with communication framework was proposed to find 
and maintain cooperation among the nodes. The authors also 
showed that with the proposed repeated game framework, 


any cooperation equilibrium that is more efficient than the 
Nash equilibrium can be achieved by using some punishment 
strategies. Then, the learning algorithms are introduced to 
help the nodes achieve the efficient cooperation equilibria. 
The simulations demonstrate that the proposed framework can 
achieve 70% to 98% better performance compared to that of 
the centralized optimal solution. 

While in (53] , the authors examined repeated games with 
imperfect observation, in (56] , the authors investigated re¬ 
peated games with imperfect monitoring. There is a slight 
difference between them. In imperfect observation games, the 
players are not sure about the actions of others in each game 
stage. As a result, they need to maintain a belief function 
to evaluate opponents’ actions. By contrast, in imperfect 
monitoring games, the players cannot monitor explicitly other 
nodes’ actions, and thus they will use a random public signal to 
infer their actions. The notation “public signal” was introduced 
in I CD- and it has to satisfy some certain properties. By using 
the public signal it is proven in ED that there exists a perfect 
public equilibrium for the repeated game with imperfect mon¬ 
itoring. In the context of wireless ad hoc networks, a public 
strategy of a node is a perfect public equilibrium if after every 
game stage the public strategy forms the Nash equilibrium 
from that stage onward. The authors then applied the grim 
trigger strategy for the nodes to punish deviators and enforce 
them to cooperate. 

In the above reviewed papers, when the node does not 
forward packet for others, it will be treated as a selfish 
node and will be punished. However, in many practical cases, 
nodes do not forward packets because they are unable to 
do so. For example, the nodes occasionally cannot forward 
packets due to energy depletion. Thus, if these nodes are 
treated as selfish nodes and are excluded from the cooperation, 
the performance of the network will be degraded severely. 
Therefore, the authors in 163 focused on designing a power 
control mechanism for the nodes with uncertainty. There is 
some subtle difference in defining actions and the payoffs of 
the nodes compared with other papers. Specifically, in each 
round, each node chooses not only the transmit power level, 
but also the forwarding probability. The node then broadcasts 
this information to its neighbors. The main goal of the node 
is to maximize its own throughput, while minimizing the 
energy consumption. Therefore, the payoff function is defined 
by the ratio between the successful self-transmission rate and 
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the total power used (for both self-transmission and packet 
forwarding). In the game, each time slot is divided into two 
phases. In the first phase, the nodes need to make a decision 
on the amount of energy used to send packets (including its 
own packets). The node will forward packets for others if it 
believes that the energy consumption for forwarding is lower 
than the rewards to be received in the future. The beliefs of 
the nodes then are updated based on the Bayes’ rule. In the 
second phase, the nodes have to determine the probability to 
forward packets for others in the next time slot. Similar to 
the first phase, the beliefs are used to infer the forwarding 
probability. The proposed strategy is shown to be the Pareto- 
dominate equilibrium numerically. However, theoretical proof 
is still open. 

2) Game-based Schemes: In reputation-based approach, 
intermediate nodes decide to forward packets based on the 
reputation. Nevertheless, propagation of nodes’ reputation 
increases an overhead in the network. Furthermore, the rep¬ 
utation can be lost, forgotten or manipulated and thus game- 
based methods have been introduced. In game-based methods, 
the node decides whether to forward packet according to its 
defined strategy, thereby avoiding the problem as in reputation- 
based approach. 

In [68], repeated games are used to model the interaction 
of nodes. The nodes as the players need to choose the action 
to forward or reject when they receive a packet from other 
nodes. If the node accepts the packet, it has to spend a certain 
amount of energy to forward the packet. Thus, the node has 
to balance between the number of its own packets and the 
number of packets from other nodes to be forwarded. In 
the repeated game, the authors used a punishment strategy, 
namely, generous tit-for-tat (TFT) to enforce the node to 
cooperate with the aim to find a Pareto optimal point. To 
make the decision to accept or reject, the node has to monitor 
all packets that it sends and that the other nodes send to it. 
Based on this information, the node computes the acceptance 
probability and makes the decision to accept/reject incoming 
packets. 

A similar model and approach are also considered in [ 69]. 
However, in this paper, instead of adjusting the acceptance 
probability, the action of the node is to determine the number 
of packets that it will send and the number of packets it will 
receive and forward. The TFT strategy is used. It is shown that 
the condition for the node to forward packets is that the amount 
of traffic forwarded by others is at least equal to that the node 
forwards. However, this can lead to unfairness because the 
deviating node will be punished severely forever. 

To overcome the unfairness issue in (681 . the authors in m 
proposed a new punishment scheme which can be seen as 
an improvement of generous TFT strategy used in [68]. At 
the beginning of the game, the nodes choose the strategy 
“cooperate” by forwarding packets that they receive. If the 
node defects from the cooperation, all the other nodes will 
continue monitoring and cooperating with this node for the 
next p—1 stages. If the deviating node continues noncooper¬ 
ating after p stages, all nodes will punish the deviating node 
for q stages by rejecting the packets from this node. During 
the punishment, if the deviating node regrets and wants to 


re-cooperate, it can help other nodes forward packets without 
sending its own packets for r stages. After that, the deviating 
node can return to the cooperation. Otherwise, this node will 
be punished forever. By using this strategy, the nodes can avoid 
the unfair punishment as in (68l and the network performance 
can be improved significantly. 

Similar to ED, the authors in (77] also proposed a fair 
solution, called a restorative trigger strategy. In this strategy, 
the nodes take a cooperative strategy by forwarding packets 
in the first two stages. In each game stage, if the cooperation 
is maintained, the node updates its estimation of the “selfish¬ 
ness”. If the “selfishness” is lower than a predefined threshold, 
the node still cooperates in the next stage. Otherwise, it will 
switch to a defection state and report the selfish node to others. 
The selfish node will be removed from the cooperation and 
if the selfish node wants to re-cooperate, it has to forward 
packets for other nodes until the cooperation expectation 
of the selfish node is greater than a certain threshold. The 
similar approach of using a punishment strategy and learning 
algorithm is considered in rm . The action is the packet 
forwarding probability which is optimized to achieve better 
network performance. 

In the same context, the authors of El developed a new 
strategy to achieve better results than that of CD- This strategy 
is based on the idea of the weakest link (TWL) borrowed from 
a famous TV game show that encourages players to cooperate. 
The nodes form a route from a source to a destination and the 
nodes in this route are considered as the candidates of a chain 
in the TWL game. Each node has the payoff defined based 
on its cooperation level, i.e., its packet forwarding probability. 
To enforce the nodes to cooperate, an infinite repeated self¬ 
learning game model is developed. In each game stage, each 
node observes its payoff and the cooperation level of others. 
Accordingly, the node adjusts its forwarding probability. If the 
node defects from the cooperation, it will be excluded from 
the cooperation for a certain number of stages. If the node 
wants to return to the cooperation, it has to raise its forwarding 
probability and notify to the others. The simulation results 
show that the proposed solution of El outperforms solutions 

of fra. 

In the aforementioned strategies, there exist an infinite 
number of Nash equilibria and in general not all of them are 
efficient. Therefore, in (72], the authors evaluated a few criteria 
with the aim to remove the Nash equilibria that are less robust, 
less rational, or less likely strategies. These criteria are sub¬ 
game perfection, Pareto optimality, social welfare maximiza¬ 
tion, proportional fairness, and absolute fairness. Furthermore, 
in ea, the authors considered the case where the nodes can 
report information dishonestly to gain their own benefits. The 
authors demonstrated that in the considered packet forwarding 
game, there is no incentive for the nodes to report honestly 
their private information. Therefore, the authors concluded 
that when cheating is possible, the node will not forward 
more packets than its opponent does for it. To address this 
problem, a Cartel maintenance profit-sharing (CAMP) strategy 
was proposed in ED. With CAMP, the nodes start the game 
by cooperation, i.e., reporting the true forwarding cost. Then, 
in each round, the nodes check their payoffs. If their payoffs 
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are lower than the expectation thresholds, they switch to play 
noncooperative game for a certain number of periods before 
returning to the cooperation state. 

In the above reviewed papers, the nodes can implement 
their strategies under the assumption that they are able to 
monitor perfectly the actions of all the other nodes in the 
network. However, this assumption is vulnerable due to the 
hardware limitation or noise/interference especially in a wire¬ 
less environment. To address the imperfect monitoring prob¬ 
lem, the authors in C3 proposed a repeated game model 
with communication. The key idea is based on the Aoyagi’s 
game (80), which is a repeated game with private signals 
communicated among players. After receiving private signals 
at the end of each game stage, the nodes decide to cooperate 
or defect. However, selfish nodes can report fake information 
to gain their own profits. Therefore, the TFT strategy is 
applied to enforce the nodes into cooperation by revealing 
true information. The equilibrium solution obtained by the 
proposed strategy is a perfect public equilibrium. 

In previous work, the strategies applied in repeated games 
are uni-strategy, i.e., all nodes take the same strategy. However, 
in some cases, nodes may not follow and they can use strate¬ 
gies that they prefer. For example, one node can use the TFT 
strategy while others apply the grim strategy. Alternatively, 
the node can take the TFT strategy at the beginning of the 
game, but it switches to the grim strategy latter. The divergent 
strategies form a mixed-strategy repeated game that is very 
complex to analyze. Consequently, in 1751 - the authors used 
simulations to study the cooperation behaviors among nodes 
which are similar to an evolutionary game GO. Five strategies 
are studied, called, always cooperate, always defect, random, 
TFT, and gradual strategy. The core idea of the gradual strategy 
is based on the adaptiveness strategy C6). In particular, a node 
will start by cooperating and remain using this strategy as long 
as its opponents cooperate. Then, if this node detects someone 
deviating, it will defect for one stage and cooperate for two 
stages. After the N-th defection, the node will defect for N 
consecutive stages and cooperate for two stages. Many results 
are highlighted by comparing the performance of different 
strategies. For example, in the environment without noise, the 
gradual strategy achieves the best performance. However, with 
noise, the gradual strategy performs merely better than others 
when the noise ratio is high, e.g., 30%. 

Similar to d, an evolutionary game theory (EGT) was 
used in [78] to enforce selfish users into the cooperation. 
Some strategies studied are TFT, grim, Pavlov (62], and 
pPavlov (63) . Through using the EGT framework, the authors 
showed that by using Pavlop or pPavlop strategies, no selfish 
nodes can gain benefits by playing a noncooperative strategy. 
The advantage of the Pavlop strategy is that it can be im¬ 
plemented in a distributed fashion, as it requires only local 
information. 

3) Methods to Deal with Malicious Users: In the previous 
subsections, we review the methods to deal with selfish users 
and enforce them into the cooperation. In this section, we 
present the applications of repeated games to tackle with 
malicious users. While selfish users act only to gain their 
benefits noncooperatively, the malicious users aim to harm and 


degrade intentionally network performance. There are some 
papers from George Theodorakopoulos et al. studying the 
problem of malicious behaviors in wireless ad hoc networks. 
The first paper (82) considers one malicious user in the 
network. Then the case with more than one malicious user 
is considered in (83) . Next, in (84) . a mechanism to detect 
malicious users is presented. 

In particular, the authors assumed that the users who want 
to disrupt the network are “Bad” users and the rest are the 
“Good” users. Bad users aim to degrade network performance, 
while Good users aim to maximize their benefits in a long-term 
basis. The actions for each node are to forward (cooperation) 
or reject (defection) packets. Good users can choose to forward 
or reject packets for their neighbors as long as their long¬ 
term payoffs are maximized. Moreover, the authors assumed 
that Bad users are intelligent. Specifically, instead of always 
choosing to reject the packets, Bad users can choose to forward 
packets randomly to deceive other Good users. Thus, it is 
difficult for the Good users to detect and punish the Bad users. 
The general scenario considered in these papers is as follows. 
Good users want to cooperate with other Good users, but not 
with Bad users. Thus, Good users try to find Bad users as 
soon as possible thereby reducing harmful interactions with 
Bad users. However, Good users do not know who Bad users 
are. The Good users can detect Bad users only if the game is 
played repeatedly. 

In (82), the authors considered only one malicious (Bad) 
user in the network. This Bad user is unknown by Good users 
in advance, but they can gradually detect the Bad behavior 
through repeated interactions. To detect the Bad user, the 
Good users will use a randomized policy, i.e., Good users 
will cooperate with probability p independently at each round. 
The probability is set to maximize Good users’ payoffs. With 
the randomized policy, it is proven that the Bad user will be 
detected by Good users after some periods of the interaction. 
The authors finally showed that the proposed strategy can 
achieve Nash equilibria and form the cooperation among Good 
users. 

In [831, the authors then extended for the case with multiple 
malicious users in the network. Instead of using the random¬ 
ized policy as in (82), the authors applied a fictitious play 
model (85) and proposed an equilibrium algorithm for Good 
users. In the fictitious play, each Good user assumes that its 
neighbors’ actions are chosen independently and identically 
distributed following the Bernoulli probability distribution. 
Hence, at each game stage, the Good users will choose 
the action that yields the highest payoff for them given the 
estimations of their neighbors’ strategies. The main conclusion 
is that if the ratio of cost per benefit for the Good users is high, 
the achievable payoff of Bad users will be low. Finally, in (84) 
the authors proposed the mechanism to detect malicious users 
for the game model proposed in [83]. In this mechanism, a 
Good user will construct a star topology of connections with 
its neighbors where the Good user is a central. The central 
Good node is able to monitor all actions of its neighbors as 
well as obtained payoffs. From this information, the central 
Good user can identify the Bad user. 

In addition to the methods proposed by Theodorakopoulos, 
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Tootaghaj et al. (86) also introduced a security mechanism 
that detects not only selfish, but also malicious users. In 
this mechanism, a node can generate their private/public keys 
and share them over the network by an independent public 
key infrastructure. After receiving packets, the node will 
send a confirmation message to its upstream hop. Based on 
the confirmation messages, the nodes can identify selfish or 
malicious nodes and perform punishment accordingly. 

B. Cooperative Transmission 

Relay networks based on cooperative transmission typically 
work in two phases as illustrated in Fig. |7Jb). In the first 
phase, a source node broadcasts a message to the destination 
and relay nodes. In the second phase, the relay nodes help the 
source node transmit data to the destination. The destination 
node combines all signals received from the source and relay 
nodes to extract information. By using relay nodes, the quality 
of the signals received at the destination can be significantly 
improved. However, the relaying process consumes additional 
energy of the relay nodes, which may discourage relay nodes 
from the cooperation. In the following, we review the repeated 
game models developed for cooperative transmission in relay 
networks. In general, there are two methods used for coop¬ 
erative transmission, namely, amplify-and-forward (AF) and 
decode-and-forward (DF) (87). For the AF transmission, the 
relay nodes amplify signals received from the source node 
before transmitting to the destination. For the DF transmission, 
relay nodes decode the source information before forwarding 
to the destination. 

1) Amplify-and-Forward Relaying Transmission: In [88], 
Yang et al. considered a relay network with two source 
nodes along with two corresponding destination nodes. In this 
network, each source can use another source node as a relay 
node to transmit data to its destination through the half-duplex 
orthogonal cooperative AF protocol as shown in Fig. [8ja). 



Fig. 8. The difference between (a) single-relay orthogonal cooperation 
protocol and (b) single-relay non-orthogonal cooperative protocol. 

The authors studied two cases for this model, namely, with¬ 
out and with fading channels. In the case of non-fading chan¬ 
nels, similar to packet forwarding games, the authors modeled 
the interaction between two source nodes as a repeated game 
and showed that there exists a cooperative Nash equilibrium. 


In this game, the players are two source nodes. Their actions 
are to relay or not. Their payoff is the amount of energy 
saved. With fading channels, the repeated game becomes 
more complicated to find the cooperation points for source 
nodes. Specifically, at each game stage, the channel status is 
different, and thus the amount of energy saved also differs. 
Consequently, the payoff matrix is varied over game stages. 
To enforce the source nodes to cooperate, the authors proposed 
a conditional trigger strategy. In this strategy, the decision of 
each source node depends on the instantaneous value of the 
current payoff matrix and the statistic of its future payoffs that 
can be interpreted as the value of relay energy for other source 
nodes. The proposed conditional trigger strategy is proven 
to be a cooperative Nash equilibrium. Through simulations, 
it is shown that the proposed strategies achieve high energy 
efficiency, and it is close to that of the centralized optimization 
solution. A similar model is also presented in (96). However, 
in [961, the authors used the Q-learning algorithm to help 
nodes achieve the expected values. 

The authors in [94] proposed an extension of [8J3] that 
resolved two limitations in (88). Firstly, the authors introduced 
the solution that does not need to rely on centralized controlled 
energy allocations as presented in [88], [96]. Secondly, the 
authors extended the model with more than two source nodes. 
Specifically, the authors used a bargaining solution to find 
efficient energy allocation strategies that can be performed 
locally for source nodes. Then the repeated game framework 
is used to determine sufficient conditions for the mutual 
cooperation between the source nodes. In the repeated game, a 
trigger strategy is introduced to encourage nodes to cooperate. 
The node has to choose action “relay” if the bargaining 
solution specifies that the required direct transmission energy 
for that node is greater than zero. Otherwise, this node 
will be punished forever. Under the proposed strategy and 
satisfied necessary conditions, it is proven that the mutual 
cooperation is possible in both fading and non-fading channel 
cases. Furthermore, to extend the model with more than two 
players, the “stable roommates” algorithm m is applied to 
form partnership among players. Numerical results show that 
the proposed strategies can dramatically improve the energy 
efficiency for the relay network. 

In (881 and (94), only one relay node is considered. How¬ 
ever, multiple relay nodes are more commonly in practice that 
is impossible to use the half-duplex AF protocol. To address 
this issue, the authors in [901 used a non-orthogonal protocol 
which allows multiple relay nodes to transmit packets to the 
destination over the same frequency band at the same time as 
illustrated in Fig. [8jb). In the proposed system model, the 
relay network is modeled as an exchange market game in 
which the nodes trade their transmission power to gain data 
rate. The cooperation cycle formation algorithm is proposed 
to help nodes achieve the strict core of the game. The concept 
of “strict core” was introduced in (9l) . (92) with the aim to 
achieve the Pareto optimal, individual rational and strategy- 
proof solution (93) . Additionally, to punish selfish users and 
revive the cooperation, a dynamic punishment strategy is 
presented which is proven that the full cooperation is a Pareto- 
dominant equilibrium. 
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2 ) Decode-and-Forward Relaying Transmission: While the 
AF cooperative transmission is a better choice for uncoded 
systems, the DF relaying protocol is especially appropriate 
for the systems using powerful capacity-approaching codes. 
In DF relay networks, when a relay node receives a request 
from a source node, it can choose to cooperate by relaying the 
decoded (and then re-encoded) data, or dismiss the request. 
Similar to AF relaying systems, repeated games are also 
used to model and encourage the cooperation. In 1891 , the 
authors studied DF relaying networks under fading and non¬ 
fading channel scenarios. In the case of non-fading channel, 
the authors proved that there exists a mutually cooperative 
Nash equilibrium between two nodes when the game lasts for 
multiple periods. In the case of fading channels, the authors 
used a Markov chain to model the status of the channel with 
two states, namely, BAD or GOOD. Moreover, it is assumed 
that the players abide by the following strategy. If the player 
does not cooperate in the current period, it will be punished 
in the next K — 1 periods. Since the payoff is a function 
of the uplink signal to noise ratio, the authors examined two 
types of function, i.e., convex and concave payoff functions. 
For convex payoff functions, it is possible to find the co¬ 
operative Nash equilibrium for the repeated game. However, 
for concave payoff functions, a mutually cooperative Nash 
equilibrium may not exist under the bad channel conditions. 
Then, two specific concave payoff functions are examined, 
namely, Shannon capacity and transmission success rate. The 
authors demonstrated that with these functions, cooperation 
can be achieved by setting an appropriate weight for the future 
payoff. 

C. Resource Sharing in Peer-to-Peer Networks 

The goal of P2P networks is to share resources among par¬ 
ticipants without or with minimal support from a centralized 
server. In ED, the authors considered the interaction between 
two participants (i.e., players). Similar to relay networks, in 
P2P networks, a player uses a service provided by other 
players. However, providing the service also incurs a certain 
cost. Using a one-shot game to model this situation, it is shown 
that all players will not cooperate. However, with repeated 
games, the cooperation can be sustainable. In [97 |, the authors 
highlighted that noise or link failure can cause misbehaviors 
for players, and thus they proposed an improvement, called, 
proportion increment TFT strategy. With this improvement, 
after a player is treated as a deviator because of noise, the 
cooperation probability of peers will be updated based on a 
cooperation ratio function and the probability will converge to 
the cooperation point between two peers if both peers continue 
cooperating. The results show the efficiency of the proposed 
strategy compared with the classical TFT strategy, and it can 
even reduce the adverse effect from the malicious behavior in 
P2P networks. 

In (97), the authors assumed that the requests from partic¬ 
ipants are identical, i.e., the same quality-of-service and the 
same cost and profit. Differently, the authors in [98] studied 
a resource sharing problem in a wireless live streaming social 
network for users with heterogeneous types. Each user has 


a type of {laptop, PDA, cellphone} and a buffer to store 
content. Video streaming from a service provider is divided 
into chunks and the users with different types have different 
costs of sharing and different gains of the received chunks. 
In each round, the users report their buffer information of 
each other, and then they send their chunk requests. After 
receiving the request, the users will decide how many chunks 
that they will transfer. The payoff is determined by the gain 
minus the cost of transferring. The users do not know, but they 
have beliefs about the opponent’s types. By using a Bayesian 
repeated game, the authors showed that there exist an infinite 
number of Bayesian-Nash equilibria (BNE). However, not all 
the BNE are efficient and thus the authors proposed different 
solutions including Pareto optimality, bargaining solution, and 
cheat-proof cooperation strategies with the aim to encourage 
users to cooperate for better performance. The authors finally 
concluded that to maximize the payoffs and to circumvent 
cheating behaviors, the players should always agree to send 
chunks as shown in time-restricted bargaining strategy quota. 

In (99), the authors extended the two-player game model 
proposed in |98) for multiple players. When a user requests 
chunks from the other users at different time slots to maximize 
its payoff, the requests for chunks may not be simultaneously 
received. Therefore, a repeated game is inapplicable and the 
solutions used in the two-player game cannot be used directly. 
The authors proposed a multiuser cheat-proof cooperation 
strategy, and in 11001 . it is proven that this strategy is a 
subgame perfect and Pareto-optimal Nash equilibrium. The 
request-answer and chunk-request algorithms were proposed. 
They will reward more for the users who share more video 
chunks. This will encourage users to cooperate. The results 
demonstrate that by using the proposed strategy, the users 
have incentive to cooperate and they can achieve cheat-free 
and attack-resistance for the networks. 

Similar to |98) and (99), the buffer cheating attacks in 
P2P wireless live video streaming systems were considered 
in tm. Instead of using bargaining cheat-proof cooperation 
strategy, the authors in 11031 considered proportional fairness 
optimality criteria and proposed a cheat-proof strategy to 
deter buffer cheating attackers. While the cheat-proof strategy 
in I f98) is based on hiding private information and thus users 
have to bargain the amount of chunks that they exchange, the 
cheat-proof strategy in 11031 is based on a fairness solution. In 
particular, the user gains nothing if it does not share any chunk. 
With the proposed strategy, the authors concluded that the user 
will not cheat if its opponent does not refuse to cooperate 
in the previous round or if there is no useful chunk in the 
opponent’s buffer. 

While in the above papers, repeated games are used to 
model the interaction between users in P2P networks, in (ED, 
the authors used a repeated game to study the relation between 
a content server and users. The content server needs to assign a 
reward to users, while the users have to decide the transmission 
rate to forward packets received from the content server. It is 
assumed that the content provider and users follow a Stack- 
elberg game [ 133 1 in which the content server (i.e., a leader) 
assigns the reward first, and then the users (i.e., followers) 
choose the transmission rate based on the reward. The authors 
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showed that for the one-shot Stackelberg game, the Stackel- 
berg equilibrium can be achieved through using the backward 
induction technique (133). The authors then extended the static 
Stackelberg game model to the repeated Stackeberg game 
model. It is found that the user may have motivations to deviate 
from the Stackelberg equilibrium. Specifically, the users may 
deny to forward packets for the content server if it does not 
offer higher rewards than that of the Stackeberg equilibrium. 
With the repeated game, if the potential benefit is large or if the 
users care much about the future reward, they will have higher 
incentive to urge the content server to offer higher rewards 
than that of the Stackelberg equilibrium. Moreover, a cheating 
prevention mechanism 1 1021 is used to deal with the cheating 
problem from users. 

D. Miscellaneous Issues 

Besides the use of repeated games for the aforementioned 
problems in different types of wireless ad hoc networks, there 
are also many other applications of repeated games that will 
be discussed in this section. 

1) Multiple Accesses: In 11041 . the authors studied a multi¬ 
ple access game in ad-hoc networks. The authors proposed the 
modified multiple access game (MMAG) by adding a regret 
cost when no user transmits. The MMAG possesses two pure 
Nash equilibria that are also Pareto optimal. If the MMAG is 
played repeatedly, there exists a subgame perfect Nash equilib¬ 
rium. Additionally, to improve the network performance, the 
authors studied the relation between the regret cost and the 
transmission probability. The results demonstrate that based on 
this cost, we can adjust other parameters to achieve efficient 
equilibria. 

A similar scenario is also studied in tm. However, to 
refine Nash equilibria to an efficient equilibrium for better 
fairness, social welfare maximization, and Pareto optimality, 
the authors in 11051 proposed a searching algorithm. The TFT 
strategy and generous TFT strategy are also proposed to detect 
selfish nodes and attract them to the cooperation. The authors 
then extended the game model to a multihop network scenario. 
Then, it is shown that although the globally optimal solution 
may not be achieved, we still can obtain the Pareto optimal 
equilibrium through the proposed game model. 

Qu et al. 11121 examined the multiple access control scheme 
in wireless sensor networks using a TDMA protocol. A re¬ 
peated game is applied to determine the transmission strategy. 
The players are the sensors and they send information to 
the datacenter. Specifically, each sensor will report its energy 
level and queue state to the datacenter. Based on the received 
information, the datacenter selects only one sensor which has 
the highest measured state to transmit all packets in the data 
queue to the datacenter in the rest of the time slot. The payoff 
of the sensor is the number of packets successfully transmitted. 
However, the sensors may report fake information to obtain 
additional benefits. The authors designed a mechanism to de¬ 
tect selfish sensors and used a punishment strategy to enforce 
selfish users to cooperate. If the defecting sensor is detected, 
all sensors will play a noncooperative game by reporting their 
highest state to the datacenter for the next L steps before they 


re-cooperate. Nevertheless, the priority of different sensors 
should be taken into account in the mechanism. 

2) Clustering: In wireless ad hoc networks, due to a large 
number of wireless nodes, one of the efficient ways to reduce 
energy consumption of the nodes is using a clustering tech¬ 
nique. In 11061 . the authors considered a clustering problem 
in mobile ad hoc networks. The authors first proposed an 
algorithm to cluster the nodes into groups. Then, the algorithm 
finds a cluster head for each group. The cluster head is able to 
communicate with all the nodes in the cluster. However, since 
the cluster head will consume more energy than others, the 
nodes may not be willing to send true information to avoid 
becoming a cluster head. To resolve this issue, the authors 
used a repeated game to model the interaction among nodes. 
In the game, the nodes are the players, and they can choose 
one of two actions, i.e., “honestly” or “dishonestly” reporting 
information. The payoff of the node is the gain that is received 
from successfully transmitting packets. To detect selfish nodes 
and enforce the nodes to cooperate, the authors proposed a 
limited punishment mechanism. It is shown that the nodes 
have no incentive to deviate from the cooperation, and thus 
they have to report information honestly. 

3) Security: In the context of wireless sensor network, the 
authors in mo used a repeated game to prevent denial- 
of-service (DoS) attacks from malicious sensor nodes. The 
repeated game is to model the interaction between the intrusion 
detection (ID) node and a set of sensor nodes. The ID node 
acts as a base station with the objective to monitor behaviors 
of sensors and punish them when they defect through using 
reputation as presented in nm Therefore, the ID node will 
have two actions, namely, punish or not punish. Differently, the 
sensor will have two actions, i.e., cooperate or defect. When 
the sensor cooperates by forwarding packets, it will not be 
punished, and thus it will gain rewards. However, if the sensor 
defects and it is detected by the ID node, it will be punished 
forever and will be out of the cooperation. Consequently, 
this sensor will gain nothing after leaving the cooperation, 
while other sensors still in the cooperation gain rewards. The 
advantage of this technique is that it handles not only selfish 
players, but also malicious users who are aiming to damage 
others. 

Similar to Il09l . the authors in nm also considered the 
interaction between the ID node and a set of sensors. However, 
while in [109] after a selfish/malicious node is detected, it will 
be punished forever, in 1 111] , the node will be punished for 
an appropriate number of periods. Then, this node can re¬ 
participate in the cooperation again if it accepts to forward 
packets from the other nodes. This scheme will help selfish 
nodes to have an opportunity to re-cooperate in order to 
improve the network performance. 

4) Multi-network Cooperation: The authors in nna con¬ 
sidered the interaction between two wireless ad hoc networks 
through using a repeated game. In this game, the players are 
two wireless operators. Each operator has a set of wireless 
nodes, and if two operators cooperate by sharing nodes for 
relaying traffic, they can help to increase QoS performance for 
the entire network. In the cooperation, the operator determines 
how many nodes and which nodes should be shared. The pay- 
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Types of game 


Solutions 



Fig. 9. Relation among types of repeated games, strategies, and solutions (RG = repeated game, SPE = subgame perfect equilibrium, TFT = Tit-for-Tat) in 
wireless ad hoc networks. 


off is defined as a function of its cost (e.g., bandwidth usage). 
In the repeated game, with a trigger strategy (that is similar 
to the grim strategy) and some appropriate conditions of the 
discount factor, there exists the subgame perfect equilibrium. 
Furthermore, to find the optimal cooperation action profile for 
both networks, the Nash bargaining solution [ 1081 is applied. 
The results show that the proposed solution can achieve the 
performance close to that of a fully cooperative approach. The 
system model can be extended by considering different types 
of networks, e.g., a relay cellular network and multihop WiFi 
network. 

Summary: From Table [TIT] we observe that there are 
many repeated game models developed for wireless ad hoc 
networks. Majority of them are for solving the packet for¬ 
warding problem. However, in wireless ad hoc networks, there 
are also many important problems, e.g., quality-of-service, 
multicasting, security, as presented in [[46ll and ATI which 
are not well examined using repeated games. Moreover, from 
Fig. [9] we observe that compared with the repeated game 
models for cellular and WLANs, those for wireless ad hoc 
networks use more diverse variations of repeated games and 
solution concepts. 


V. Applications of Repeated Games in Cognitive 
Radio Networks 

This section discusses the applications of repeated games in 
cognitive radio networks (CRNs). CRNs are intelligent com¬ 
munication networks that have been designed to improve the 
spectrum utilization and transmission efficiency. CRNs allow 
unlicensed users, called secondary users, to access opportunis¬ 
tically available spectrum allocated to licensed users, called 
primary users E3- In CRNs, before accessing a licensed 
channel, unlicensed users need to check the channel state and 
then decide whether to access the channel or not. Therefore, 
the applications of repeated games in CRNs mainly focus on 
solving two major problems, namely, spectrum sensing and 
spectrum usage. Additionally, there are some applications in 
pricing competition problem, called spectrum trading , that is 
also reviewed in this section. 

A. Spectrum Sensing 

Spectrum sensing is a task of secondary users (SUs) to 
detect the presence of primary users (PUs) through many 
sensing techniques such as energy detection, matched filter 
detection, and cyclostationary feature detection as presented 
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TABLE III 

Applications of repeated games in wireless ad-hoc networks 
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in EE). The effectiveness of spectrum sensing is limited by 
geographic separation or channel fading, and thus SUs need 
to cooperate to overcome this problem. However, SUs may 
not be interested in cooperating since they have to exchange 
sensing results that consumes a certain amount of resources. 
Consequently, a repeated game is a useful tool to motivate 
the SUs to cooperate with the aim to improve the quality of 
spectrum sensing thereby enhancing the network performance. 

In [115 ], the authors proposed using repeated games to 
model the interaction among secondary users (SUs) when they 
want to cooperate to sense a common spectrum allocated by a 
primary user. In this game, the SUs have two actions, namely, 
cooperate or not cooperate. If the SUs are cooperative, they 
will share their spectrum sensing results with others. However, 
this consumes energy due to broadcasting information. By 
contrast, if they are not cooperative, they will gain and lose 
nothing. When the profit obtained from sharing sensing results 
(i.e., b) is lower than the cost for broadcasting information (i.e., 
c), the SUs will never be cooperative. Otherwise, for b > c, 
there is an incentive for the SUs to cooperate if the interaction 
among SUs is repeated for sufficiently many periods. In the 
repeated game, the authors investigated two strategies, namely, 
grim and carrot-and-stick, for two scenarios, i.e., with and 
without transmission loss. With the transmission loss, the 
SUs may not receive information from others. Consequently, 
the SUs will not cooperate and their performance will be 
detrimental. It is shown that as the packet loss probability 
increases, the ratio b/c must be increased to guarantee the 
cooperation among SUs for both the strategies. Furthermore, 
the authors concluded that the value of b from the carrot- 
and-stick strategy must be twice larger than that of the grim 
strategy to make SUs cooperate. 

intim Kondareddy et al. extended the cooperative sensing 
model to packet relay in CRNs. The authors formulated the 
cross-layer game which is a combination of a cooperative sens¬ 
ing game and a packet forwarding game. In this cross-layer 
game, the SUs choose actions from a set of four independent 
actions {Share (S), Do not Share (DS), Forward (F), and Do 
not Forward (DF)}. Therefore, there are four possible actions 
for each player to choose from, i.e., S-F, S-DF, DS-F, DS-DF. 
The payoff is the profit. It is shown that for the one-shot cross¬ 
layer game, the Nash equilibrium is the mutual defection, i.e., 
the players will choose actions (DS-DF) and this leads to 
the poor network performance. Nevertheless, if the cross-layer 
game is repeated, by using the TFT strategy and under some 
conditions for the discount factor, the authors proved that the 
subgame perfect Nash equilibrium is the mutual cooperation 
strategy for the SUs. The most noticeable achievement of 
the repeated cross-layer game is that the players can achieve 
cooperation in both spectrum sensing and packet forwarding 
by using punishments for packet forwarding action only. 
Moreover, the authors considered imperfect observations that 
may happen by mistake or due to noise. The game is modeled 
as the Prisoner’s Dilemma with noise nm By using the 
generous TFT strategy cm the authors showed that the 
mutual cooperation strategy for the SUs is the subgame perfect 
Nash equilibrium. 

In HI 151 and 11161 . the authors presented the solutions to 


deal with selfish SUs. However, there can be malicious SUs 
sharing wrong information. For example, when the primary 
channel is idle, the malicious SU can report a busy channel, 
and hence no other SUs can use this channel. In 1 118 1. the 
repeated game model was developed for this problem. The 


game is presented in Table [TV] Uh and Ud are the profits when 
the SUs choose co and no (i.e., cooperate and not cooperate) 
actions, respectively. C s is the cost when SUs participate in 
spectrum sensing and C r is the cost for overhearing the sens¬ 
ing information from other SUs. From this payoff matrix, a 
malicious SU will gain a negative profit if it is noncooperative. 
Consequently, the malicious SU has no incentive to defect. A 
distributed trusted model is used to identify the malicious SU 
through the evaluation of SUs’ reputation. Based on the action 
history of SUs, the SU can update the global reputation for 
its all neighbors. If the global reputation of any SU reaches a 
predefined limit, this SU will be identified to be a malicious 
and will be punished forever. 


TABLE IV 
Payoff Matrix 



cooperate (co) 

not cooperate (no) 

cooperate (co) 

U h -Cr-C s , U h -Cr-Cs 

- C s -Cr, U d -C r 

not cooperate (no) 

U d -Cr, -Cs-Cr 

-Cr,~C r 


B. Spectrum Usage Management 

After spectrum sensing, spectrum usage or spectrum access 
will be performed. In this section, we review the applications 
of repeated games to help secondary users (SUs) to access 
available channels. We discuss four major problems which are 
illustrated in Fig. [T0| 


/ Spectrum 

Interference / 


/ Opportunistic \ 
\ Spectrum Access / 


i Spectrum Usage Management 


►-( Spectrum Sharing ) 


[Spectrum Clustering) 


Fig. 10. Spectrum usage issues. 

1) Opportunistic Spectrum Access: Unlike multiple access 
control in cellular and WLANs presented in Section |III- A 
in CRNs when an SU accesses a licensed channel, it takes 
not only the existence of other SUs, but also the presence of 
PUs into account. Thus, the spectrum access in CRNs is more 
dynamic and more complex than that of cellular and WLANs 
networks. One of the popular solutions is using a Markov 
model for analysis and optimization. Then a repeated game is 
used as a tool to encourage the cooperation for SUs. 

In 1 1191 . the authors developed the distributed dynamic 
spectrum access scheme for SUs and modeled non-cooperation 
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relation among SUs as an infinitely repeated game. The players 
are the SUs, the actions are to access the idle channel or 
not, and the payoff is the average throughput. Since the SUs 
can access only an idle channel, the authors used a primary- 
prioritized Markov dynamic spectrum access ill20j to analyze 
the impact of PUs to SUs and derive optimal SU’s channel 
access probabilities. Then, by using the Folk theorem, it is 
shown that with the discount factor close enough to one, the 
game can achieve an efficient Nash equilibrium. To do so, 
the authors introduced the distributed self-learning algorithm 
for the SUs to obtain the optimal access probabilities. In the 
learning algorithm, the SUs start with low access channel 
probabilities. In each iteration, the SUs increase the access 
channel probability and observe the payoff. If the payoff is 
improved, then the SU will continue increasing the channel 
access probability. Otherwise, the SU stops. However, the 
convergence of the learning algorithm is open for further study. 

in rim a Markov model was developed to optimize 
channel access probabilities for SUs. However, as the number 
of SUs is large, the computation complexity increases expo¬ 
nentially which causes difficulties in finding optimal policies 
for the SUs. Therefore, the authors in 1112111 proposed a 
reduced-complexity suboptimal Markov model. The method 
is to separate users’ behaviors, thereby significantly reducing 
the state space of the Markov model. To motivate the SUs to 
cooperate, a repeated game is used. Additionally, the authors 
used the learning algorithm based on the policy gradient 
method to update the channel access probability for the SUs 
and proposed the solutions to detect and enforce deviating 
users into the cooperation which were not addressed in cm 
While E91 and ED used learning algorithms to control 
channel access probabilities, in 1 122 1. the authors examined 
the CSMA/CA protocol to resolve collisions when SUs access 
channels. In the CSMA/CA protocol, when the player trans¬ 
mits data and collision occurs, it selects a random number 
w and retransmits after w time slots. Selfish users can take 
advantage by reducing the parameter w thereby increasing 
the channel access probability. Thus, the authors used the 
detection method based on the evaluation of average through¬ 
put. In particular, within a monitoring duration, if a player i 
detects that player j has the average throughput higher than 
i’s throughput, the player i will notify all other players in the 
network. Then, the honest players will punish the selfish player 
j by jamming the j’s transmission in the next certain number 
of game stages. The authors then established the conditions 
for the monitoring duration and punishment period such that 
the punishment will be effective to deter the deviation. 

2) Interference Control: In underlay spectrum access, i.e., 
SUs access a common channel simultaneously, mutual interfer¬ 
ence occurs. To control the impacts of interference, the trans¬ 
mission power levels at the transmitters have to be carefully 
optimized. There are two types of networks studied in the 
literature, namely, unlicensed and licensed band networks. In 
the unlicensed networks, SUs needs to control the interference 
only with other SUs. Conversely, in licensed networks, SUs 
have to avoid the interference with PUs as well. 

a) Unlicensed band networks: In fl23) . Etkin et al. stud¬ 
ied the spectrum sharing problem among SUs for interference- 


constrained networks. The authors considered the scenario 
with M systems (SUs) coexisting in the same area. Each 
system includes a pair of a transmitter and a receiver. The 
systems share the same unlicensed channel, and this is mod¬ 
eled as a Gaussian interference channel. If the systems are not 
cooperative, they will transmit data with high power levels 
that leads to severe interference and reducing significantly 
network throughput. Therefore, a repeated game is used to 
encourage cooperation. In this game, the players are the SUs, 
the actions are transmission rates and the payoff is the sum 
rates of SUs. The TFT strategy is used to punish selfish users. 
Based on the Folk theorem, it is proven that the proposed 
strategy achieves the subgame perfect Nash equilibrium when 
the discount factor is set sufficiently close to one. 

To deal with liars who may gain benefit by communicating 
fake information (e.g., fake channel measurement), Etkin et 
al. extended their work in H123II by introducing a truth telling 
method lfl24l . This method is based on the protocol with 
detection mechanisms using test messages exchanged among 
SUs together with energy detection. Thus, the repeated game 
is modified by adding an initial stage to exchange and verify 
the channel measurement in all stages of the repeated game. 
If there is any deviation detected, a punishment is triggered 
in which all the transmitters spread their power over the total 
bandwidth in the rest of the game. The proposed method not 
only detects deviations, but also liars who send fake infor¬ 
mation. However, this method requires extensive information 
exchange, which could be a shortcoming for energy-limited 
networks. 

In (1231 and 11241 . after a deviation is detected, all users 
will play a noncooperative game forever. This strategy can be 
inefficient. To overcome this drawback, the authors in El 
adopted different strategy, namely, publish-and-forgive. In this 
strategy, after detecting the deviation, the noncooperative game 
will be played for the next T — 1 periods, and then the players 
will cooperate again. Besides the cooperation criterion which 
maximizes the total throughput (MTT) for the network as 
presented in 11231 and 11241 . the authors in 11261 considered 
different cooperation criterion, i.e., approximated proportional 
fairness criterion (APF) to help the players with poor channel 
conditions have an equal opportunity to access a channel. The 
method to detect liars in um is proven to be less complex 
and easier to implement that that in 1 1124 1. 

To reduce interference and improve the network perfor¬ 
mance for SUs, the authors in E3 considered using multi¬ 
carrier transmission and studied the interactions among SUs 
communicating over the same frequency band composing of 
multiple carriers. In the multi-carrier system, the SUs have 
to choose the transmission power levels on each carrier to 
maximize their own payoff. Again, a repeated game is used 
to encourage the SUs to cooperate. In this game, the SUs 
cooperate by using Pareto transmission strategies. If any SU 
deviates from the cooperation, then the other SUs will choose 
the noncooperative Nash equilibrium strategy in the remaining 
of the game. 

In all above work, to control interference, SUs have to adjust 
their transmission power levels. In 1112811 . the authors used an 
intervention scheme 11291 for repeated games with the aim 
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to enlarge the limit set of equilibrium payoffs and loosen the 
conditions for the discount factor that needs to be close to 
one in general repeated games. In the intervention scheme, 
an intervention device is introduced to observe and intervene 
interactions among SUs. Specifically, the intervention device 
monitors the actions of SUs and then makes decisions (e.g., 
punishment) to deter deviation and enforce cooperation. With 
the intervention device, in the repeated game, the players 
include SUs and the intervention device. The actions are 
transmission power levels and the payoff is throughput. The 
protocol is proposed to maximize a joint objective function 
(e.g., total throughput of the network). This protocol is shown 
to achieve the subgame perfect equilibrium of the repeated 
game. The author proposed a trigger policy that makes players 
cooperate. If any user deviates from the cooperation, it will 
be punished for certain number of periods. Simulation results 
show good performance of using intervention in terms of sum 
payoff and max-min fairness. Additionally, with the support 
of the intervention, the set of equilibrium payoffs are extended 
and the discount factor condition can be reduced by around 
50%. 

b) Licensed band networks: To reduce the interference 
with PUs on a licensed band, the authors in Ifl27l adopted a 
local spectrum server (LSS) as presented in Fig. ED The LSS 
is to control the interference between SUs and PU. The PU 
specifies an acceptable interference temperature to the LSS. 
The LSS also monitors the activities of SUs in which the 
monitoring is imperfect. Thus, a repeated game with imperfect 
monitoring m is used. After the SUs transmit data, the LSS 
measures and compares the interference with a predefined 
threshold. If the interference is higher than the threshold, the 
LSS will send a warning message to the SUs. Based on the 
received information from the LSS, the SUs will adjust power 
levels accordingly. The authors then introduced a deviation- 
proof policy implemented in a distributed fashion for the SUs. 
In the policy, the SUs compute indexes that measure “urgency” 
for their data transmission. The SU with the highest index 
is chosen to transmit data in a time slot. It is proven that 
with the proposed policy, the SUs will achieve an optimal 
operating point that will give them no incentive to deviate and 
the proposed policy converges to a perfect public equilibrium. 


Primary user 



Fig. 11. An example for using local spectrum server, gij is the channel gain 
between entities i and j, where 0 is the local spectrum server. 

3) Spectrum Sharing: Spectrum sharing is another method 
used to prevent collision and interference. The idea is to divide 
the spectrum into non-overlap sub-channels and then allocate 
them to users. However, different from bandwidth allocation 


in cellular and WLANs networks which have a central node 
(e.g., an access point or base station) to control the allocation, 
in CRNs, the SUs have to unify or self-define the amount of 
used spectrum without any support from a central node. Thus, 
SUs may be selfish and they may excessively use spectrum, 
thereby significantly degrading the network performance. In 
this section we review the applications of repeated games for 
spectrum sharing. 

In 1113011 . the authors considered the spectrum sharing 
problem between two WLAN networks when they coexist 
in the same area and share a common unlicensed spectrum 
to serve their users. The WLANs need to compete with 
each other to use the spectrum. A repeated game is used to 
encourage WLANs as the players to cooperate. The actions of 
WLANs are choosing the amount of spectrum. The action is 
defection if it maximizes the payoff. Alternatively, the action 
is cooperation if the MAC parameters of the WLAN are 
less aggressive and take opponents’ actions into account. The 
payoff is QoS measure of the players in terms of achievable 
throughput, period length, and delay. The authors considered 
static strategies, i.e., always cooperate, always defect, ran¬ 
dom policy (50% cooperate and 50% defect), and dynamic 
strategies, i.e., grim and TFT. Numerical results show that 
different strategies are appropriate for different circumstances 
depending on the QoS requirements. For example, when the 
QoS requirements of player 1 is (throughputs. 1, period 
length=0.05, and delay=0.02) and for player 2 is (through- 
putS.4, period lengthS.031, and delayS.02), the authors 
showed that the best strategy for player 1 is always cooperate. 
However, the best strategy for player 2 is TFT. 

In Ifl36l . the authors studied a cognitive radio network with 
two SUs sharing a common unlicensed channel. The SUs can 
cooperate by using half of bandwidth through the frequency 
division multiplexing (FDM) technique or not cooperate by 
spreading their transmit power, called STP, over the entire 
bandwidth. The authors proposed a scheme to detect the 
selfishness of a rival player. The authors then designed a guard 
interval for the SUs, if a player starts by cooperating and after 
the guard interval, the rival player still does not cooperate, 
the cooperative player will choose to perform STP (i.e., not 
cooperate). Through the analysis and simulation, it is shown 
that the proposed strategy can achieve mutual cooperation and 
the equilibrium is stable. 

In previous papers, the spectrum usage schemes are random, 
i.e., the demand parameters are selected randomly at the begin¬ 
ning of the game am or they work under some predefined 
assumption, for example, SUs choose FDM or STP fl36l . 
In 11321 . the authors proposed an alternative method for the 
use of bandwidth through exchanging information among SUs. 
In particular, before making decisions to access the channels, 
an access point reports its traffic demands to all other access 
points in the same network. Based on the information received 
from the access points and its demand, the access point 
will make channel usage decision. If the interaction among 
access points takes place one time, the access points will 
play a noncooperative game by claiming the highest traffic 
demand. However, when the interaction lasts for multiple 
periods, a repeated game can be used to model and enforce 
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the access points to cooperate by informing the true informa¬ 
tion. In 032), it is shown that by using trigger punishment 
strategies (mi, there exists a subgame perfect equilibrium 
for the proposed repeated game. The length of the punishment 
phase is determined in a similar way to that in am and with 
the punishment phase, the selfish access point will have no 
incentive to deviate. 

Extended from 11321 . in [ 1311 . the authors used two specific 
cheat-proof mechanisms along with a method to identify 
the information from access points which was not provided 
in cm The first mechanism is based on the “transfer” in 
Bayesian theory ED. With this mechanism, the access points 
are asked to pay a tax when they use high traffic demands 
and the access points with low bandwidth demand will be 
compensated from the tax. The tax will be increased according 
to the declaration of traffic demand, and thus the access points 
have to balance between traffic demand and monetary budget. 
Then, the payoff of the access points is defined such that it 
can gain the highest payoff if the access points report true 
information. In the second mechanism, the authors studied a 
statistical cheat-proof strategy for the repeated game. Here, 
the access point is considered as a deviator if it overuses the 
allocated spectrum and after being detected, the selfish access 
point will not be allocated bandwidth until its average usage 
is under a safety range. 

In 11341 . Xie et al. studied the spectrum sharing problem 
between satellites and terrestrial cognitive radio networks 
(CRNs). The authors considered a scenario where there is 
a satellite in an area with two other CRNs (e.g., two base 
stations), and they compete for a common unlicensed spectrum 
band allocated by primary users. The satellite can interfere 
with both CRNs; however, there is no interference between two 
CRNs. Since the movement of a satellite is often very fast, the 
satellite is in the interference area for very short time period. 
Due to the short interaction, the repeated game considered in 
this paper is a finitely repeated game. The players of the game 
are satellite (BS1), and two CRNs (BS2, BS3). The authors 
assumed that the size of the band is totally 2 W, and thus the 
players can choose to occupy one band (i.e., W) or all two 
bands (i.e., 2 W). The payoff function is the achievable rate. 
The players know the end of the game, and thus there is a 
Nash equilibrium, which is the noncooperative strategy where 
the players always spread their power over all 2W bands. 
Additionally, the authors addressed the issue of partially blind 
observations. Specifically, BS2 and BS3 cannot observe the 
actions of each other. Moreover, BS2 and BS3 may not know 
the existence of the satellite and cannot observe its actions. 
To resolve partially blind observations, the authors proposed 
using refreshing TFT (R-TFT). The R-TFT is an extension of 
the TFT strategy where the cooperation is refreshed quickly 
once one of the players cooperates. When the observations of 
the players are affected by noise, the TFT strategy becomes 
an ineffective strategy. Therefore, the authors investigated 
two schemes, namely, refreshing contrite TFT (RC-TFT) and 
refreshing generous TFT (RG-TFT), which are based on the 
contrite TFT and generous TFT strategy in 11171 . Here, the 
refreshing process is applied when a satellite appears in the 
area. After an efficient equilibrium is obtained between the 


satellite and the BSs, the G-TFT and C-TFT are applied to 
circumvent an adverse impact from noise. 

4) Cooperative Clustering: Recently, there have been some 
research work considering clustering problems in CRNs. In 
clustered CRNs, some SUs are grouped together to access 
a channel cooperatively. This can avoid an over-accessing 
problem when the number of SUs is large. 

Clustering is an effective energy saving solution for SUs 
in CRNs. Traditionally, SUs must transmit data directly to 
distant base stations, causing contention and consuming ex¬ 
cessive energy. With clustering, some SUs can use short range 
communication to transfer data, thereby reducing congestion 
and energy consumption. However, when clusters are formed, 
there must be an effective way to motivate SUs to cooperate 
by transmitting data for each other. In 11421 . the authors 
developed a repeated packet forwarding game to model an 
interaction among SUs in the same cluster. Each SU has two 
actions, i.e., cooperate by forwarding packet for others or not. 
The payoff is the profits minus the cost for forwarding packets. 
It is shown that the grim strategy can lead to the Pareto-optimal 
Nash equilibrium where the players want to cooperate. 

Fi et al. (143) extended the clustering model in CRNs 
proposed in [ 1421 by using cluster-head SU. In particular, 
each cluster will have a cluster-head SU which is responsible 
to receive packets from other SUs in the same cluster and 


transmit them over primary channels as shown in Fig. 12 


With the centralized model through the cluster-head SU, the 
SUs can avoid frequent collision and reduce delay of packet 
transmission. The idea of clustering and nomination cluster- 
head SU is based on the use of weight metrics, including 
the ideal degree, transmission power, and battery power as 
introduced in 11441 . A repeated game is used to model the 
interaction between SUs in the same cluster, and through using 
the grim strategy, the authors proved that the game can achieve 
Pareto optimality. 



Fig. 12. A repeated game for clustered cognitive radio networks. 


C. Spectrum Trading 

Repeated games are also used for spectrum trading between 
SUs and primary users (PUs). In CRNs, PUs can sell their 
spectrum to SUs to gain revenue and improve spectrum utiliza¬ 
tion. However, when there are multiple PUs selling spectrum, 
they have to choose appropriate offer prices to attract SUs and 
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TABLE V 

Applications of repeated games in cognitive radio networks 


Problem 

Article 

Players 

Actions 

Payoff 

Strategy 

Solution 

Spectrum 

Sensing 

11151 

Secondary users 

sharing spectrum results 

profit minus cost 

Carrot-and-Stick 

SPE 

fuel 

Secondary users 

sharing spectrum results 
and forwarding packets 

profit minus cost 

TFT and 
Generous-TFT 

SPE 

m 

Secondary users and 
malicious uses 

sharing spectrum results 

profit minus cost 

cheat-proof 

SPE 


11191.11211 

Secondary users 

channel access probability 

achievable throughput 

punishment trigger 

SPE 


EM 

Secondary users 

backoff counter 

achievable throughput 

cheat-proof 

Pareto optimal 


EM, EM 

pairs of 

transmitter-receiver 

power levels 

transmission rates 

TFT 

SPE 


EM 

pairs of 

transmitter-receiver 

power levels 

achievable throughput 

cheat-proof 

SPE 


fl25l 

pairs of 

transmitter-receiver 

power levels on 
multi-carrier 

achievable rates 

Cartel maintain 

Pareto optimal 


fl27l 

pairs of 

transmitter-receiver 

power levels 

achievable throughput 

cheat-proof 

perfect public 

<D 

a 

D 

&D 

a 

d 

fl28l 

pairs of 

transmitter-receiver 
and an intervention 

power levels 

achievable throughput 

forgiving 

SPE 

cd 

S 

D 

bO 

cb 

on 

£ 

H30l 

WLANs 

MAC parameters 

a function of QoS 

Grim, TFT, and static 
strategies 

SPE 

f!36l 

secondary users 

FDM or STP 

achievable transmission 

rate 

adaptive 

SPE 

a 

cia 

access points 

exchange information 

spectrum usage 

punishment trigger 

SPE 

B 

Q 

Hsu 

access points 

exchange information 

spectrum usage 

cheat-proof 

SPE 

D 

Oh 

00 

EM 

a satellite and two 
base stations 

spectrum usage 

achievable rate 

variations of TFT 

SPE 


EM, EH 

SUs in the same 
cluster 

forward or reject 

profits minus cost 

Grim 

Pareto optimal 

Spectrum 

EM 

primary users 

prices 

profits 

Grim 

SPE 

Trading 

|139| 

primary users 

prices 

profits 

Grim 

SPE 


compete with each other. A repeated game is used to help PUs 
in pricing. 

In 11371 . the authors considered an interaction among 
primary services when they provide and sell opportunistic 
spectrum accesses for SUs. The players are primary services 
and the strategy is to offer the price per unit of spectrum. 
The payoff is the profit from selling bandwidth to SUs minus 
the cost for spectrum sharing, e.g., due to QoS degradation 
of PUs. The authors first formulated the competitive pricing 
problem among PUs as the Bertrand game 11381 . If this game 
is played only one time, then the solution of the game is the 
noncooperative and inefficient Nash equilibrium. However, if 
the game is played repeatedly, there is a motivation for the 
primary services to cooperate by offering collusive prices that 
maximize their profits. A trigger strategy is used to force 
players to cooperate. 

In 11371 . the authors assumed that PUs always have avail¬ 
able spectrum to sell to SUs. However, in reality, such licensed 
spectrum may not be available when PUs are using it. The 
authors in [1391 considered this problem by assuming the 
probability to have a unit of available bandwidth in each 
time slot. The authors showed that there is no pure Nash 
equilibrium for the one-shot game, and thus the authors 
considered a special class of a Nash equilibrium, called a 
symmetric Nash equilibrium ll40l . Then, the one-shot game 
is proven to possess the unique symmetric Nash equilibrium. 
For the repeated game, the subgame perfect Nash equilibrium 
(SPNE) is adopted. However, the efficiency of the SPNE is 
shown to be low. Thus, the authors proposed a Nash reversion 
strategy tHD to make players cooperate and improve the 


efficiency of the SPNE. The Nash reversion is stated as 
follows. At the beginning of the game, the players select a 
maximum price and this selection will be remained as long 
as no player deviates. If there is at least one player deviating, 
all players will play the symmetric Nash equilibrium strategy 
as in the one-shot game. Finally, the efficiency of the above 
SPNE is proven to be maximal if the discount factor satisfies 
a certain condition. 

Summary: From Table [V] and Fig. [13] we observe that 
most of repeated game models developed for CRNs are 
conventional repeated games. Their main aim is to solve 
spectrum management problems. Nevertheless, there are few 
repeated game models for the cooperative spectrum sensing 
and trading. Moreover, there is only one finitely repeated game 
model 11341 . 

VI. Applications of Repeated Games in Other 
Networks 

Apart from the traditional wireless networks, in this section, 
we review some important applications of repeated games 
in special types of networks including network coding, fiber 
wireless access, and multicast networks. 

A. Wireless Network Coding 

The core idea of network coding is to combine received 
packets and transmitting them together in the same flow 
instead of simply forwarding each packet separately m 
In 11471 and 11481 . the authors introduced using network 
coding in a butterfly network including two source nodes s\ 
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Types of game Strategies Solutions 



Fig. 13. Relation among types of repeated games, strategies, and solutions (RG = repeated game, SPE = subgame perfect equilibrium, TFT = Tit-for-Tat) 
in cognitive radio networks. 



Fig. 14. Wireless network coding. 


and 52 and two destination nodes t\ and t 2 as shown in Fig. 14 


Two types of packets from s\ and 5 2 are “routing” or “network 
coding”. A routing packet y n will be forwarded immediately 
after it is sent to node i. A network coding packet z n will 
be encoded together with others using XOR encoding Ifl46t . 
When source nodes s\ and 5 2 send network coding packets 
to node i , they also have to send “remedy data” v n to their 
destinations t\ and t 2 , through their side links (si,t 2 ) and 
(s 2 ,ti), respectively. The transmission on each link incurs a 
certain cost. Thus, the source nodes need to make a decision 
to choose the transmission rates, i.e., y n , z n , and v n . For a 
noncooperative one-shot game, it is proven that the nodes will 
choose the noncooperative strategy (z n =v n = 0). In particular, 
the nodes will not use the network coding to avoid payments 
over the links (si,t 2 ) and (s 2 ,ti), and this strategy is the 
unique Nash equilibrium of the game. However, if the game is 
played repeatedly and by using the grim trigger strategy, the 
nodes can achieve the subgame perfect equilibrium in which 
the nodes have an incentive to cooperate. Additionally, to 
find the efficient common network coding rate for the nodes, 
the bargaining game model is introduced. The results show 
that by using the proposed solution, the worst-case efficiency 


compared with the optimal network performance can be upper- 
bounded by 48% as shown in 11511 and 11521 . In 11531 . 
the repeated game for network coding considers not only a 
forwarding action, but also the number of transmissions, the 
set of upstream nodes, and the set of downstream nodes. The 
authors showed that there exists a strategy profile that is a 
subgame perfect equilibrium. 


B. Fiber Wireless Access Networks 

In 11491 and 11501 . the authors applied a repeated game 
to Fiber-wireless (FiWi) access mesh networks (AMNs). 
There is a difference between traditional wireless mesh net¬ 
works (TWMNs) and FiWi access mesh networks (AMNs). 
In TWMNs, packets are sent to any node in the network. 
By contrast, in FiWi AMNs, packets are always transmitted 
through gateways. Therefore, FiWi AMNs concern about the 
interaction between forwarding and gateway nodes. In the 
repeated game, the players are the forwarding and gateway 
nodes. The actions are to set the amount of foreign and local 
traffics. The payoff of the forwarding nodes is a gain function 
minus a cost function. The payoff for the gateway nodes is 
only a gain function. The forwarding node can defect by 
forwarding small amount of foreign traffic and high amount of 
local traffic. However, the gateway can punish the defecting 
node and thus curtail its payoff in the future. Through the 
repeated game model, there is an incentive for the nodes to 
cooperate resulting in the better network performance. 

C. Wireless Multicast Networks 

The authors in ED examined the application of repeated 
games in wireless multicast networks. The authors considered 
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Fig. 15. 


Summary information (a) the percentage of repeated game models, (b) the percentage of wireless network models, and (c) research trends. 


a single hop multicast network with a base station and a group 
of nodes receiving packets from the base station with two 
phases. In the first phase, the base station transmits multicast 
packets to the group of nodes and selects a relay node that 
successfully receives the packets from the base station. In the 
second phase, the relay node will rebroadcast the packets to 
other nodes. The relay node can gain some benefit from the 
other nodes, but it incurs a certain cost for the relay node. 
Therefore, selfish nodes may not be interested in forwarding 
packets. The repeated game model is developed to encourage 
the node to cooperate. In this game, the nodes are the players 
and if the node is selected as a relay node, it has to choose 
transmission power levels to rebroadcast packets. The payoff is 
the reward from the number of successfully forwarded packets 
minus the cost. The worst behavior TFT incentive strategy is 
proposed to enforce the nodes into the cooperation. By using 
the punishment strategy, the authors proved that the nodes can 
achieve the subgame perfect equilibrium in both the cases of 
perfect monitoring and imperfect monitoring. 

D. Other Networks 

Besides the aforementioned networks, there are some appli¬ 
cations of repeated games in miscellaneous wireless networks 
such as vehicular ad hoc networks (VANET) cm small 
cell networks im wireless network virtualization 023. and 
wireless mesh networks 11581. In OS), a repeated game is 
used to model packet forwarding strategies of vehicular nodes 
based on QoS optimized link state routing (OLSR) protocol. 
In 11561 . the power control problem of small cell networks is 
modeled as a repeated game. The setting is similar to other 
typical repeated game-based power control schemes except 
that there is co-channel interference among densely deployed 
small cells. 

VII. Summaries, Open Issues and Future Research 
Directions 

A. Summary 

In Fig. p~5| w e summarize key information from this survey. 
From Fig. |15| (a), we observe that almost all papers focus 
on using conventional repeated games (approximately 60%), 
while other extensions have not been much used. This is a 


potential research opportunity. Furthermore, we summarize the 
percentage of repeated game models for different types of 
wireless networks in Fig. [15] (b). We also show the research 
trend for each network in Fig. [15] (c). From Fig. [15] (b), we 
observe that majority of applications are for wireless ad hoc 
networks, followed by cognitive radio networks, and cellular 
and WLANs, subsequently. However, from Fig. |T5] (c), while 
the applications of repeated games in traditional networks 
subside, those of emerging networks have increasingly grown. 

B. Challenges and Open Issues of Repeated Games in Wire¬ 
less Networks 

1) Applications with finitely repeated games: Through this 
survey, most repeated game models are for infinite time 
horizon and there is only one application of finitely repeated 
game EMI. However, there are many issues in wireless 
networks that require the models developed for a short-term 
context such as CM), BSD. ESD- Therefore, the study of 
effective solutions to the finitely repeated games is essential. 

2) Defining payoff functions: Similar to other game models, 
the most challenging part of developing repeated game models 
for wireless networks is defining a payoff. The payoff function 
must capture and balance network performance and individual 
player’s incentive. More importantly, the physical meaning 
of the function must be well justified. Nevertheless, some 
important properties of repeated games (e.g., existence of an 
equilibrium) depend on the payoff function. Therefore, it is 
important to determine a formal approach to define the payoff 
function of the players in the game. 

3) Sequential decisions: In repeated games, the actions of 
players are assumed to be taken simultaneously. However, 
in some wireless networks, the players may not take action 
precisely at the same time. One of the solutions to such a 
situation, specifically sequential decision making, is a Stakel- 
berg game. The combination of repeated game and Stakelberg 
game allows us to solve the non-simultaneous decision making 
problems. In 11011 . the authors considered this game setting. 

4) The mixed-strategy: Repeated games assume that all 
players follow the same strategy. For example, if one player 
uses the TFT strategy, all other players also have to play 
the TFT strategy. This assumption may not be true in some 
cases in which different players can adopt different strategies. 
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This leads to the difficulty in finding an equilibrium of the 
game. In m, the authors studied this problem through using 
simulations and stated that evolutionary games can be one of 
the potential solutions. 

C. Potential Research Directions of Repeated Games in Next 
Generation Wireless Communication Systems 

1) Mobile cloud computing: Mobile cloud computing 
(MCC) is a convergence model of mobile devices, service 
providers, wireless networks and cloud computing to bring 
benefits to mobile users, mobile applications developers, and 
cloud providers 0. In MCC, different entities can cooperate 
to achieve better performance and profit. For example, cloud 
providers can cooperate with mobile operators. To attract them 
to cooperate, a repeated game can be applied. 

2) Wireless powered communication networks: The re¬ 
markable advancement of wireless energy harvesting and 
transfer technologies has created many new research directions 
recently rim um The introduction of wireless energy 
harvesting techniques has solved inherent problems of wireless 
networks, i.e., energy-constraint problem of wireless nodes, 
thereby bringing many new applications such as wireless body 
area networks nsi or wireless charging 11661 . However, as 
energy becomes the resource that can be shared efficiently, in¬ 
centive mechanisms for wireless energy harvesting and transfer 
nodes have to be developed. In this case, repeated games can 
be adopted for such mechanisms. 

3) Machine-to-machine communication in 5G networks: 
With the emerging of Internet-of-Things (IoT) and 5G net¬ 
works, the number of wireless devices is expected to dra¬ 
matically increase ltl67l . Through the concept of machine- 
to-machine (M2M) communication, the wireless devices has 
to contend for limited radio resource, and cooperation is 
construed as desirable behavior to achieve efficient network 
operation. However, due to large number of devices, enforcing 
the cooperation becomes a complex task, and a conventional 
repeated game cannot be simply applied. An extension of the 
game to support a large number of players is worthwhile for 
studying. 

VIII. Conclusion 

A repeated game is a powerful tool to model and resolve 
an interaction and conflict in wireless networks. In this paper, 
we have presented a comprehensive survey of the repeated 
game models developed for a variety of wireless networks. 
We have provided basic and fundamental of the game as well 
as demonstrated its advantages. Basically, the applications of 
repeated games can be divided based on the types of networks, 
i.e., cellular and WLANs, wireless ad-hoc networks, and 
cognitive radio networks. In addition to detailed reviews of the 
related work, we have also provided analysis, comparisons and 
summaries of the literature. Furthermore, some open issues 
and future research directions of repeated games in wireless 
networks have been highlighted. In conclusion, this paper will 
be a keystone for understanding repeated games in wireless 
communication systems. 
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